Friday, March 12, 2010

Tamino vs random walk

Grant Foster has found a new inconvenient enemy, a random walk:
Not a Random Walk
He starts by saying that Euler has "proven" God by the following assertion:
Sir, (+ bn) / z = x, hence God exists. Reply!
In other words, Euler emitted some mathematical nonsense and considered it a proof of God. (See the slow or fast comments for details about this story.)

Off-topic: Local Theory of Relativity. Hat tip: Olda K. ;-)

Tamino's article continues as expected. Tamino emits some mathematical nonsense and considers it a proof of man-made global warming. His multi-kilobyte verbal emissions make no sense, except for making one point.

He's trying to "debunk" a comment by his reader who argued that the measured temperature series may be represented as a random walk - which is a much better model to describe the measured data than a linear trend combined with a white noise.

Of course, Tamino doesn't like it because it undermines the power of the AGW religious cult he faithfully believes. So he tries to produce some counter-arguments. The only counter-argument he produces is that random walks produce unbounded functions.

That's true except that it's irrelevant. To create a random-walk-based model that produces bounded functions, it's enough to replace it by an autoregressive model. Nonlinearities prevent the model from going "too far away" from the normal. However, during any reasonable timeframes, e.g. those shorter than 10,000 years, the model will be indistinguishable from a proper random walk.

Colors of noise

When we say that a function, "f(x)", resembles "white noise", it means that its values at different values of "x" are random and independent from each other. Such functions are inevitably completely discontinuous. If we use them as a model of temperatures, the temperature in the next year has nothing to do with the temperature of the previous year. It can suddenly jump to the temperatures seen in 1650.

Needless to say, this is not how temperatures behave. The temperatures are much more continuous than that. It's physically clear why it has to be so. One needs heat - energy - for the temperature of a large object to change. And heat - energy - can't flow infinitely quickly. We know this to be the case for theoretical reasons - and we also observe that quantities don't change infinitely quickly.

Is there a better, more continuous model of a random function (such as temperature series) than white noise? You bet. It's pink noise, an interpolation between white noise and red noise. So I have to explain you the red noise.

Red noise: random walk

A random walk is a function such that the increment, "f(x) - f(x-1)", is a random variable distributed symmetrically around zero. And for different values of "x", these increments are independent of each other. In other words, you may visualize a random walk as an integral of a white noise.

The character of the functions obtained as random walks is also called red noise - because it favors low-energy (red) components more than the white (color-neutral) noise does - or Brownian noise because the Brownian motion of pylon particles driven by the chaotic motion of water molecules is the oldest example of a random walk in Nature (observed by biologist Robert Brown, explained by Albert Einstein in 1905 and Marian Smoluchowski in 1906).

Random walks have one property that is so important that I will prove it for you. How far will you get after time "t" of random walk? What I want to prove is that the distance scales like "sqrt(t)", the square root of time. Why is it so?

First, a random walk may drive the pylon to the left or to the right: "f(t)" may be both positive and negative. We only want to know the absolute value. It's much more natural to talk about the squared absolute value, i.e. the squared value of "f(t)" (the sign squares to a "plus" in both cases). Generate many random walks with the same parameter and ask: what is the average value of "f(t)^2" computed out of all these random walks? Imagine that "t" is the number of steps.

We have
f(t) = f(0) + step(0 → 1) +
+ step(1 → 2) + ... + step(t-1 → t)
Let's choose the initial position "f(0)=0" and the steps of the random walk are random numbers equally likely to be positive and negative. The average value of the square is
[f(t)2] =
= [(step(0 → 1) + ... + step(t-1 → t))2] =
= ...
Are you following me? The "[...]" brackets represent the average value over many random walks. And inside these brackets, I have simply written the square of "f(t)". The latter is just the sum of "t" different steps because it's a random walk.

You should recall the formula for the square of a sum. When you expand this formula, there will be many "mixed terms" such as
[ step(0 → 1) . step(5 → 6) ]
However, the signs of the step from time 0 to time 1, and the step from time 5 to time 6, are independent of each other, by the definition of the random walk. So the average values of the type above are just zero! The positive contributions cancel against the negative ones. Consequently, only the squared individual terms contribute to the average value. Returning to the previous displayed equation, we learn that
[f(t)2] =
= [(step(0 → 1) + ... + step(t-1 → t))2] =
= [step(0 → 1)2 + ... + step(t-1 → t)2] =
= t . [step(0 → 1)2]
The average or "expectation value" we were computing is simply given by "t" (because there were "t" equal terms) times a universal parameter describing the typical step. It's more natural to take the square root of the result above. The typical
typical value f(t) = sqrt ( [f(t)2] ) =
= sqrt(t) x const
That's great. After time "t", the random walk drives you to a distance that is proportional to the square root of "t". A key formula for the random walk.

Matching temperature jumps with the random walk

You can actually see that this is a very good model for the most typical temperature changes (e.g. of the global mean temperature) after time "t" as long as "t" is shorter than 10,000 years or so. Just use this formula:
Typical temperature jump after time "t"
= sqrt(t / year) * 0.08 °C
You express the time in years, take the square root of it, and multiply the result by 0.08 °C. Look what it gives you for various values of "t":
1 year: 0.08 °C
10 years: 0.25 °C
100 years: 0.8 °C
1000 years: 2.5 °C
10,000 years: 8 °C
This table by itself looks excellent. The final figure, 8 °C, is the typical difference between the temperature during ice ages and interglacials. The middle figure, 0.8 °C, is approximately the estimated warming in the last 100 years. It just seems to work beautifully.

For timescales longer than 10,000 years, the observed temperature changes stabilized. They were never much bigger than those 8 °C, even during the millions of years. At the level of our theoretical model, it's easy to add nonlinear terms that discourage too big deviations from some idealized mean - deviations by more than 10 °C. Such terms can also be given a natural physical explanation. They're negative feedbacks of various types. They "complexify" the simple random walk model into an autoregressive model.

Now, if you study the detailed numbers more accurately, you will find out that the actual color of the noise describing various temperatures is not quite red: it's somewhere in between the white noise and the red noise, even when you compare e.g. the variations after centuries with those after decades.

We need to understand some basic power laws in a more coherent and unified way. The typical temperature jump is proportional to a power of the time separation. The exponent depends on the color of the noise:
white noise: jump in T goes like t^0
red noise: jump in T goes like t^(0.5)
pink noise: jump in T goes like t^(0.25)
The white noise had the exponent equal to zero. Anything to the zeroth power equals one: so the zeroth power means that the temperature jump was time-independent for the white noise. No matter how much time you wait, you will deviate from the original temperature by the same step.

Similarly, the 0.5-th power is the square root that I derived for the random walk - or red noise - previously.

Pink noise is a technical name for a random function whose typical jump as a function of time scales as a power law in "t" - a power law with an exponent exactly in between the white noise and the red noise. You can create "pink noise functions" e.g. by producing "white noise" for the Fourier components, multiplying them with a proper exponent of the frequency (corresponding to the desired power law), and by Fourier-transforming them back to the time variable.

Various temperature series are pretty close to the pink noise but you must be ready that the detailed exponent depends on the context, the local or global character of your measurements, and it also depends on the timescale "t" itself.

Also, there are big irregularities if "t" is just a few years because the El Nino / La Nina dynamics is not quite random. It is "more periodic" than a typical random function - although El Ninos and La Ninas surely don't alternate "exactly periodically".

At any rate, it is straightforward to construct statistical models based on pink noise, perhaps with some autoregression included, that describe all the statistical features of the measured data very well. Recall that I "predicted" the UAH temperature until 2100:

Note how natural, continuous this extrapolation beyond 2010 looks like. Whenever someone tells you that he has a nice theory of climate change - of the variations of the temperature - you should make some basic tests. You should check how large the "typical jump" of the temperature is, and how it depends on the time separation "t". In intervals comparable to a few years up to a century, the dependence will be closely matching a power law. The exponent is interesting and you should compare it between the models and the observations, too.

There are lots of deep physical data that can be obtained by a proper statistical analysis of the time series. Most of the data is thrown away by the IPCC types. They're not interested in any of that. They're not interested in the exponents that determine the color of the noise, the correlation lengths, the correlation times, and the processes that are responsible for them. They're interested in a "trend" which has really nothing to do with the variability because a description in terms of a "trend" assumes that the deviation of the temperature from a linear function (that they try to identify with the climate even though no such a function plays any important role in the climate, and no one knows how this function should look like, anyway) is white noise - but it surely is not.

The real deviation of the temperature from a "normal long-term temperature", whatever you choose it to be, is close to a pink noise and the precise parameters of this pink noise, especially the amplitude and the exponent, are extremely important. They're what the climate variability is all about. They're the most important parameters whose predictions by the theories and models should be tested against the observations.

I challenge Grant Foster or anyone else to find the simplest "pinkish noise" model of whatever temperature series they like. Fine-tune the amplitude and the exponent. Assume that the temperatures are "pinkish noise" given purely by these two simple parameters. And try to falsify this theory by a statistical test.

To be honest, I actually know the ways how to solve this task because I have learned not only the amazing depth included in this simple model but also many of its limitations. The real climate system is more complex and similar power laws, while very useful in many intervals, break down behind their range of validity, much like any phenomenological theory you can invent. In various regimes, you may also see that there are probably "two different kinds of noise" with two different exponents being added.

But these pinkish noise models are still extremely good, and infinitely more accurate descriptions of the real observations than a "linear trend plus white noise" model that the AGW types would like the people to buy. The white noise is perhaps easier for the laymen to understand than the red noise, and certainly more so than the pink noise, which is why people often buy this naive theory of the climate (trend plus white noise). That's a pity because a priori, it's a very ill-motivated physical model. And a posteriori, it's a model that doesn't agree with the observations well.

More generally, I want to emphasize that a working predictive theory of the climate can never "neglect" the "natural variability". Even if there were an important man-made trend, it's clear that the climate variability is important, too. It's damn important to know how large it is and how it depends on the timescale (and distances), so that we also know the typical timescale where the "man-made linear trend" could start to beat the natural variability. We surely know that the timescale from the previous sentence is longer than 15 years because it's pretty easy for the natural factors to beat the man-made effects for 15 years because there's been no statistically significant warming since 1995. But it may be much longer.

So even if the man-made trend existed and were large, it's completely self-evident that most of the research concerning "climate change" would have to focus on the variations which are obviously of natural origin and they have always existed, pretty much with the same amplitudes and exponents. The climate has been around for billions of years and it's been changing for billions of years.

You can learn exactly nothing if you deny all of this and you only focus on some hypothetical, politically motivated term that only existed for 100-150 years. The Earth's long history doesn't deserve to be denied in this way. While the Young Earth people at least admit that the Earth is older than 5,000 years, the AGW cultists want to deny any history of the climate before the year 1850 or so.

People like Grant Foster are just looking at the climate in a completely wrong way. It is a way based on pre-existing dogmas - and a way that prevents one from ever learning anything besides these dogmas. It's a method that discourages the people from studying anything that looks "random": all "random" things are just inconvenient and should be thrown away. They only want to see "non-random" signals which can never be isolated in the real climate because most of it is random and the details how it is random mater.

In other words, in the real climate, these random phenomena are the essence. And the color, character, and consequences of their "random" behavior hides the most important keys about the climate change, the most important collection of material that our theories should agree with. And it may potentially give us all the predictions of a future climate, too.

And that's the memo.


  1. The anecdote concerning Euler's "proof of God" as I recall has Euler at a party where the hostess warned him that another rather obnoxious guest was present. This other guest, whom I think was also famous though his name escapes me, was an outspoken but mathematically challenged atheist. Euler approached him, uttered his irrelevant mathematical expression, and demanded that the other man refute it.

  2. The anecdote about Euler, repeated by Eric Temple Bell in "Men of Mathematics," has been shown by historians to be apocryphal and likely dismissed as absurd by the intended audience even if it was true.

    Lubos is correct about pink noise modeling and I think the following is true - for the exponent a on time t for o<a<1, the corresponding Fourier series representation of the function is absolutely convergent everywhere (for a=0 there are functions with divergent Fourier series everywhere)

  3. Lubos,

    The increase in global avg temp since 1880 is not merely random: A random increase would cause a negative energy imbalance or the extra energy would have to come from another segment of the climate system (eg the ocean, cryosphere, etc). Neither is the case: There is a positive energy imbalance and the other reservoirs are also accumulating energy.

    I think it is fair to conclude that the observed increase therefore has not been random.

    Moreover, there is a known positive radiative forcing; measurements indicate an enhanced greenhouse effect (more IR being emitted to space and being more IR being reflected back to the surface); changes in other segments of the earth system also indicate global warming.

    A different question would be, if these data, without any physical constraints on it, could mathematically be described as "VS" does (purely stochastic; random walk). Perhaps it could; I refrain from an opinion on that. I do note though, that in light of the physical system that these data are a part of, this is a purely academic mathematics question. The physics of it all tells me that it hasn’t in fact been random, since it is inconsistent with other observations.

    I provide a longer reasoning at my blog:

    Bart Verheggen

  4. Dear Bart,

    your "argument" that the changes are not random is completely nonsensical. You say that there's a positive energy imbalance. Well, I am not really sure.

    But even if it is, the probability for the imbalance to be positive given a random evolution - e.g. for a Markov process - is 50%. So that's not shocking that it's positive. It can only be positive or negative. And by the way, if it were negative, you would conclude the (almost) same thing - but the "worry" would be global cooling.

    After all, this is not just a speculation because this is what some people were worried about 35 years ago.

    So while I know methods how to exclude various excessively simple random models, you haven't offered any valid argument - not even a glimpse of an argument - that would suggest that the temperature series were not random (similar to pink noise).


  5. I have just posted a series of statistical tests on this issue:

    ADF, KPSS, PP, and DF-GLS and the results are listed on Bart's blog in the comments.

    Everybody is welcome to take a look and inspect my method / arguments, here:

    Summarizing: Using these different statistical tests, we find:

    ADF: Clear presence of a unit root

    KPSS: Stationarity (no unit root) rejected at 5% and 10% sig, not at 1% sig.

    PP: No presence of unit root, but only when using 'intercept and trend' in the test equation specification

    DF-GLS: Clear presence of a unit root

    In my view, this is enough statistical evidence to conclude that the GISS combined surface and land record is in fact integrated of the first order, or I(1).


  6. An illuminating post. In other words "climate science" is riddled with confirmation bias.