Wednesday, January 21, 2009

HadCRUT3: autocorrelation and records

Eduardo Zorita was kind enough to look at my previous
calculations of autocorrelation and frequency of clustered records
that used the GISS data.

Because I claim that the probability of their clustered records is above 6% while they claim it to be below 0.1% and because both of us know that my higher result is caused by a higher autocorrelation of my random data relatively to theirs, Dr Zorita told me that he thought that my autocorrelation was much higher than the observed one.



However, it's not the case. The only correct statement would be that my autocorrelation is higher than theirs. But my autocorrelation matches the observations while theirs is much lower than the observed one.

One of the technical twists of our discussion has been that I switched to the HadCRUT3 monthly data. We have much more resolution here: 159 * 12 = 1908 months of global (and other) temperature anomalies. In a new
Mathematica notebook (PDF preview),
I did several things:



  • imported the 1908 HadCRUT3 monthly anomalies (importing of such TXT tables is really smooth with Mathematica)
  • drew the temperature graphs and the autocorrelation graph for lags between 0 and 500 months.
More importantly, I had to design a model that reproduces the two graphs above. The optimal model is made out of two AR(1) processes. The formulae for the "damped random walk" have been explained in the previous article. You should realize that the jumps and damping now refer to a monthly evolution rather than the annual evolution.

The nontrivial function hiding in my model, "arone", only has one parameter, "damping", besides the number of months "l=1908". My optimal process has two components:
  • damping=0.85 (almost white noise): SD of monthly jump=0.07 °C
  • damping=0.99999 (almost unregulated random walk): SD of monthly jump=0.015 °C
I can also add a linear trend to the sum of these two AR(1) processes. With these choices, I get an agreement with the HadCRUT3 autocorrelation. For example, one of the first autocorrelation graphs was this one (the upper image is the actual random temperature graph):



Here, the red curve is the HadCRUT3 autocorrelation, the typical number between -1 and +1 for two "vectors" of monthly data that are shifted by a lag of "x" months. The blue curve is the same quantity calculated from my double AR(1) random model. Looks very good, doesn't it? In some models, the blue curve is slightly above the red curve. In others, like this one, it's the other way around. There is surely no "obvious" or "systematic" discrepancy here.

At any rate, the sum of the two AR(1) processes seems to agree with the statistical characteristics of the observed 159 years of data. Note that the autocorrelation should drop (we're describing the red, HadCRUT3 curve) from 1 at no lag to 0.7 for lag equal to 20 months or so, and then is dropping towards 0.3 or so around 400 months of lag (even though the details of the behavior there depend on chance). It seems to work.

Clearly, the fast decrease of the autocorrelation for small lags was guaranteed by the AR(1) with the damping of 0.85, while the almost undamped random walk guarantees the slower decrease afterwards.

The model (with no linear trend added) also has a good long-term behavior. At hundreds of thousands of years, the temperature fluctuates in a 8 °C window or so: the number 0.99999 is far enough from 1 to have this regulating effect (because of the square roots). That means that the "freedom" we give to the temperature at long time scales doesn't exceed the limited fluctuations known from the glaciation cycles. But the inequality may be rather close to saturation: most of the ice ages and interglacials can be due to this random walk dynamics.

Fine, now when I have a satisfactory model, I can generate a lot of random "planets" - random monthly datasets from the prescription above. Then I calculate the annual average data for each and check which of them have at least 13 record years in between the last 17 years. The result is 6% or so, for reasonable choices of the trend, in agreement with my previous announcements.

The effect of a linear trend

It is interesting to look at the influence of a linear trend that we may add to the two AR(1) processes. If the trend is zero, the probability of an alarmist planet (13 record years among the last 17) is around 4%: among them, 2% for the record-heat planets and 2% for the record-cold planets.

If we choose the linear trend to be +0.3 °C per century, the ratio of record-hot and record-cool planets gets dramatically asymmetric: 5% of the planets satisfy the condition with the 13 record hot years in 17 recent years, while 0.5% satisfy the record cold condition.

If the trend is +0.5 °C per century, the probabilities switch to 8%, 0.3%. And for +1.0°C per century, they go to 18% (hot) and 0.1% (cool). However, with such a huge linear trend, the autocorrelations are way too high. For most of the "planets", the autocorrelation doesn't drop below 0.9 even for 500 months of separation. Even though you may get a slightly higher probability of achieving 13 hot records, you pay a high price: the linear character of your predicted temperature no longer agrees with the observations that are apparently dominated by noise (random walk).

If you fixed this problem by increasing the amplitudes of the noisy components, the probabilities would drop back, close to 6% for the total hot+cold ones.

Of course, in reality, the hypothetically significant enhanced greenhouse trend has only existed for 50 years or so, not for 159 years, but I think that even the more recent data are dominated by the "random walk" rather than a trend and this fact may be demonstrated by too high autocorrelations of trend-dominated models. By the way, you can also see with your eyes that the trend-dominated models have too smoth curves.

Let me summarize it by saying that if we only look at the HadCRUT3 record, the hypothesis that all the temperature data are made out of random walks - one of which is allowed to be undamped - is just alive. In all cases that are remotely consistent with the observations, the probability of achieving a sequence of hot records is substantial, comparable to several percent. This probability fails to be small enough to establish the existence of an underlying trend.

One time constant, many time constants?

Note that we have used two AR(1) processes with different values of "damping". The different values of "damping" mean different "time constants", i.e. periods of time after which the noise dissipates. There can exist and there probably do exist processes in Nature at many time scales that try to stabilize the noise. You can't associate them with a unique time constant.

The shorter time constant is comparable to a year while the longer one is equal to many centuries or millenia. Clearly, a kind of weighted harmonic average may be equal to five years and this average may be relevant for various calculations of the climate sensitivity and other things, see e.g. the paper by Stephen Schwartz.

On the other hand, other questions demand a different kind of average of these time constants, for example one that is closer to the arithmetic average. More concretely, the statistics of achieving many hot records in 1-2 decades is primarily controlled by (the trend and) the time constant of the slowest component of the random walk. Because this "damping" parameter is equal to 0.99999 which is close enough to one, the temperature graphs at the centennial scale can be thought of as a random walk.

Many people find it counterintuitive to imagine that the temperature is allowed to go in either direction. But there's nothing inconsistent or scientifically invalid about it, much like there is nothing impossible about the currency exchange rates going persistently and madly in one direction - and later, equally persistently and madly, in the opposite direction.

For example, the total cloudiness of the atmosphere is a parameter that can take many different values and the mechanisms forcing the global overall cloudiness to return to the "correct" value are just extremely inefficient and slow. Just try to calculate the average cloudiness of the atmosphere from the first principles: it will be difficult to figure out where to start because it can be almost anything. In other words, cloudiness has a lot of freedom to change in any direction it wants and do so for many centuries or millenia. Only when the deviations from the "normal value" become as large as what they usually are after many millenia, "someone" starts to regulate them.

This is almost certainly how the atmosphere works. Not only alarmists but also many skeptics fail to understand that many quantities describing the climate are "free" degrees of freedom that can "walk" in any direction they want and influence everyone else. It's not true that each change has a clear, deterministic cause, and it's not true that the causes whose character is random must "average out" after a few years.

According to all the sensible models I can imagine to match the data, the typical "random" change of the temperature that we get every century is 0.5 - 1.0 °C, so there is nothing unusual about our seeing the very same thing in the 20th century. There is no observational evidence in the instrumental data for an underlying trend that would exceed the noise (random walk). Only if the temperature change per century became much greater than 1 °C, something like 3 °C to approach the 4-sigma territory, we would have a reason to think about the underlying trend and perhaps even be concerned. It's surely not the case of our situation in 2009.

And that's the memo.

No comments:

Post a Comment