Wednesday, November 16, 2011 ... Français/Deutsch/Español/Česky/Japanese/Related posts from blogosphere

James Hansen and 3-sigma "proofs"

Last week, Eric Berger mentioned a paper by the notorious recidivist James Hansen,

Climate Variability and Climate Change: The New Climate Dice,
written together with Sato and Ruedy.



Summers in Moscow will be the heroes of this article. I visited the Russian capital in Summer 1992 (International Math Olympiad): a fun experience.

It contains some temperature maps of the globe, comments about the Heaven and divine interventions, absurd speculations about the "extreme weather", and the usual announcements that the Earth will die on the day after tomorrow. But I was intrigued by – and decided to discuss – the authors' would-be quantitative "proof" of global warming based on the 2010 heat wave in Moscow.

Let me say in advance that the degree of their sloppiness is so spectacular that they should have been kicked out of the college during the first quantitative exam they attempted to face. Today, it's too late and we must watch how similar mediocre slackers have contaminated – and help to further contaminate – the scientific community.




Let us look at the key paragraph I am going to discuss in quite some detail. They argued:
Thus there is no need to equivocate about the summer heat waves in Texas in 2011 and Moscow in 2010, which exceeded 3σ – it is nearly certain that they would not have occurred in the absence of global warming. If global warming is not slowed from its current pace, by mid-century 3σ events will be the new norm and 5σ events will be common.
This is a pretty big package of sloppiness. When you read these simple clichés, you may identify that their statement is based on (or requires) several assumptions needed for it to become logically justifiable, especially the following three:
  1. In the absence of global warming, the summer temperature in Moscow taken from each year are distributed according to the normal distribution
  2. The summer temperatures in 2 different years (e.g. in the two following years) are uncorrelated: the temperature graph has no autocorrelation
  3. The number of cities like Moscow or regions such as Texas on the planet Earth is approximately equal to one so that rare events in Moscow or Texas are also rare events on Earth; in other words, one may neglect the so-called look-elsewhere effect.
It would be enough if one of the three assumptions above were incorrect and the whole Hansen-Sato-Ruedy line of argument would be invalidated. However, the reality is much more severe: all of the three assumptions above are invalid. When corrected, each of them happens to dramatically weaken the Hansen et al. argument. In the rest of the text, I will discuss why all the three assumptions above are critically needed for their argument to be defensible; and why all the three assumptions are wrong.

Let me start with the "idealized" picture of the world, as probably imagined by the three authors. The summer temperature in Moscow in the year X is independent of the summer temperature in Moscow in all previous (and following) years. It can be any number but it is normally distributed, with some mean value (mean summer temperature) and some "width" or "variance" or "standard deviation". Moreover, all cities and regions in the world have to obey the orders from Moscow. Well, this was almost the case in a half of the world for a half of the last century but the truth to be told, the disobedient Soviet bloc still failed to exactly copy the weather in Moscow.



With these assumptions, the probability that the temperature deviates by more than three times the standard deviation, or 3σ, from the mean value is just 0.3%: this is the standard calculus of the Gaussian (normal) distribution. Because in the climate science, 0.3% may be approximated by zero, the observations in Moscow couldn't have occurred given the assumptions. It proves that global warming is real, man-made, cataclysmic, and the deniers have to be sent to Siberia.

(Well, obviously, nothing really shows that the "surprising" effect would be man-made: there's no evidence for "man-made global warming" here. But my main point is stronger, namely that there's no evidence for "any global warming" or "any surprising event that isn't expected in the noise" here, either, as long as the evidence is evaluated properly and rationally.)

Non-Gaussianity

Even if you imagine a completely stable climate, it is not true that the temperatures are distributed normally. For example, it's known that the temperature variations become larger – and larger excursions from the mean value therefore become more likely – if the weather is cold. In particular, if the water freezes over, the heat capacity of the Earth's surface decreases – partly because the frozen water (ice) has a lower heat capacity, partly because ice can't transfer heat from one place to another by convection. You should check the Arctic temperatures and notice that the variations in the summer are vastly smaller than those in the winter.

This asymmetry between the behavior of the distribution at lower and higher temperatures isn't the only violation of the Gaussian nature of the curve. Quite generally, the actual distributions of the temperature are decreasing more slowly than the Gaussian curve – which decreases faster than exponentially – if you look at the extreme values (far left or far right tails of the distribution). If you think about it, you won't be able to apply the central limit theorem here because the temperature in Moscow isn't a (simple linear) sum of a large number of terms.

Because the distribution isn't really normal, one can't translate the deviation by 3σ to the probability of 0.3%. It simple doesn't work in this case. The actual probability that you would deviate by more than 3σ is actually much higher than 0.3%. But as you may guess, while the Gaussianity of the distribution is clearly an invalid assumption (even if we assumed that the climate is essentially stable), this wasn't the main problem with their line of argument.

Two much more far-reaching mistakes in their logic are that they neglected all of autocorrelation; and they neglected the existence of other cities in the world except for Moscow (and other regions than Texas).

Autocorrelation

If you study Moscow's temperature in a particular year X and the following year X+1, imagine 1962 and 1963, you could think that the summer weather during these two years is completely independent. However, you must know that this assumption can't be quite correct. Many warm years tend to clump together, like the 1930s in the U.S.; and many cold years tend to do the same thing, like the 1960s.

A statistical evaluation makes this point clear. Can you understand the origin of this clumping theoretically? You bet. Statisticians would talk about the autocorrelation of the temperature graph. Where does it come from? Well, it boils down to the continuity of the temperature.

Even though the temperature in Moscow is changing, if you look at it with a good enough resolution, it is a continuous function. If you have a very warm year and you want to get a very cold year right after that, you need to get rid of lots of heat. However, the rate of energy transfer can't be infinite. It's still finite; I don't have to explain to you that this "inertia" in the temperature is ultimately due to "heat capacity". Chances are that you won't be able to change the temperature by "too much". It's always easier for the climate to depart from the initial temperature if it has a longer time to do so.

I have stressed this point in many previous TRF articles. At some level, a good qualitative description of the temperature is the Brownian motion, or a random walk. It is a continuous function which is however non-differentiable almost everywhere. The total temperature deviation after time T goes like a constant times the square root of T; if you divide the temperature change by T to get the warming rate or the cooling rate, the rate scales like one over the square root of T. In the real climate system, the power isn't quite 0.5 and you get some kind of a "pink noise" and it's combined with many other complex features of the climate but many qualitative lessons from the "random walk model" are deep and important.

So the cold years tend to clump. Because of that, it is very likely that even if the long-term standard deviation of the summer temperature from the normal is σ, you will underestimate the magnitude of σ if you look at a couple – or a few dozens – of years only. Those years' temperatures are closer to each other (than a more extensive group of more distant years) because of the temperature's inertia: so the real σ that you would get in the long run may easily be twice as large or larger than the σ that you obtain from a finite period of time, e.g. from 50 years, because of the autocorrelation of the temperature.

That's why the 3σ should be clarified as 3σ calculated from the short-term or medium-term variations of the temperature. The real long-term σ may be larger, e.g. 2σ of the short-term type, which would mean that those 3σ (short-term) really mean just 1.5σ (long-term) and the 99.7% confidence level could drop to 85% or so.

But I haven't explained the third fallacy which is arguably the most important one: James Hansen et al. "deduce" huge conclusions out of a 3-sigma effect. The 3-sigma effects are well-known in particle physics and as the frequent readers of this blog (or other blogs: and especially physicists who don't need to read any blogs) know very well, most of the 3-sigma signals ultimately go away. The probability that a 3-sigma result survives is much smaller than the "naively expected" 99.7%.

Much of the reduction may be attributed to the "publication bias": people are cherry-picking "interesting" findings and it is inevitable that they get many "false positives" if they look at many places. Even if the number of "true positives" (discoveries) that await us is zero, we are statistically guaranteed to get "false ones" from fluctuations if we try to make discoveries sufficiently many times.

A part of this problem may be brought under quantitative control and the relevant keyword is:

Look-elsewhere effect

Even if we don't cherry-pick papers and we only work with the data in one experiment or graph (one-dimensional or two-dimensional), there is a big risk that we overestimate the importance of 3σ deviations. Why? If we look at a very long graph – whose length is L – and a 3σ deviation has the width of W, such a deviation may occur roughly at L/W distinguishable places. Just try to squeeze bumps of size W to an interval of size L.

If L/W is high and these candidate places for bumps are independent from each other, it's pretty much guaranteed that some of them will see large deviations. If you participate in a lottery for a very long time, you will ultimately win at least the smallest price at least once.

Assume that the previous problems – non-Gaussianities and autocorrelations – didn't exist. Study the temperature in Moscow. In 2010, it was found to be 3σ above the normal temperature. The temperature is normally distributed and not autocorrelated at all. What is the probability that this occurs in Moscow? Well, yes, it is 0.3% per year. In average, such a situation will occur once in 300 years. Just to be sure, the probability that this happens either in 2010 or 2011 is 0.6% and the probability that it occurs in Moscow sometime in the last 50 years is 15% (0.3% times 50) or so.

However, we were just talking about the probability that Moscow will see above-the-normal temperatures. We must appreciate that Moscow wasn't a blindly selected city. Despite its importance, Moscow isn't the only city in the world that could have been used in a similar argument. Moscow was cherry-picked because it produced interesting results. The real question we should be asking is: What is the probability that we may find a region on the globe whose importance is comparable to the Moscow region and whose summer temperatures on a given year will surpass the mean value by 3σ?

This is a much more relevant question because this question really classifies whether there is something "interesting" going on or not. Now, how many places are there on the globe? The local weather is kind of correlated. When it's cold in Cambridge, Massachusetts, it's usually cold in Boston, too. When you look at monthly temperature, it makes sense to divide the globe to "effective" regions of size 1,000 km times 1,000 km. The weather of all places within the same square is "almost" correlated while disjoint squares behave almost independently. This is not an exact description of the reality and the estimate of 1,000 km isn't the exact result of a calculation (but I have done many analyses of this sort and the number isn't unrealistic) but it correctly captures the effective number of independent degrees of freedom, the number of different regions whose temperatures should be viewed as independent.

Note that the realistic square I mentioned has the area equal to 1 million square kilometers. The area of the globe is 510 million square kilometer which is 510 times larger (or only 1/3 of it if you dismiss the stories about the oceans). If the probability of above-the-normal temperatures which are at least 3σ were 0.3%, then the probability that at least one of the 510 square regions would experience the summer that obeys this condition is naively 0.3% times 500 which is about 150%. Well, this is too naive because the probability only rises approximately linearly as long as it is much lower than 100%. Then it increases more slowly to guarantee that the probability never surpasses 100% because it shouldn't. (If you consider the land, or 1/3 of the globe, only, you get to the vicinity of 50%.)

At any rate, the probability that one of the 510 regions on the globe surpasses (or undershoots) the normal summer temperature by more than 3σ on a given year is almost 100%. It's nearly guaranteed to happen. So even if you neglect the non-Gaussianity (especially the fact that in the real world, extreme deviations from the normal are much more likely than they are in the normal distribution: the decrease of the tails is slower); and even if you neglect autocorrelation (which means that the actual standard deviation is larger than what you could guess from a limited period of time because the temperatures in this period tend to clump), you will still conclude that the probability that you find a region such as the Moscow region whose temperature deviates by more than 3σ on a given summer is almost 100%. It's pretty much guaranteed to happen which also means that you can't deduce anything out of it.

This qualitative outcome becomes even more obvious if you take the non-Gaussianity and autocorrelation of the temperature into account. The probability that you may find similar "cute" stories about unusual temperatures or other things during 1-2 years somewhere in the globe is close to 100%. So one has to be a completely incompetent crackpot or a full-fledged fraudster if he claims that one may derive anything "spectacular" out of a 3σ summer heat wave in Moscow. Such things are guaranteed to occur on almost every year. You could say that it was Moscow in 2010 and it is Texas in 2011. If you took all the historical data and looked at previous years where the data may cover (almost) the whole globe, you would find similar results as well. Some years (such as 1911 which was superhot both in Europe and America, as I discussed in recent centennial stories) could show several regions; other years could offer no records. However, you would hardly find decades when "nothing of the sort" could be found anywhere in the world.

Man-made aspects

So in the text above, I explained 3 effects that vastly increase the probability that something like a 3σ Moscow heat wave actually occurs – from 0.3% implicitly claimed by Hansen et al. to something very close to 100%. But even if all these problems (non-Gaussianities, autocorrelation, and look-elsewhere effect) were fixed and circumvented and if one could find some properly defined, non-cherry-picked condition whose probability was very low but which was recently satisfied, it would still not be any evidence of a man-made effect. (Of course, the natural influences on the climate are partly related to the autocorrelations that have already been discussed.)

The reasoning of Mr Hansen and his comrades is indefensible on every single level.

And that's the memo.



Australia and freedom of speech

The Herald Sun brings a kind of incredible story from Australia: when the approved $23/ton carbon dioxide tax will start to influence the prices of products (guess in what direction), members of a new carbon Gestapo will be walking through the streets and harass or arrest people who would dare to admit that the carbon tax is behind the price hikes. Shops won't even be allowed to tell the people "buy now before the price goes up" before the price goes up due to the carbon tax.

Doesn't Julia Gillard also want to prevent people from admiting that she is a lying totalitarian bitch who should be shown the door as soon as it becomes possible? In that case, she should share the same pleasure as the Romanian president on Christmas 1989.

Add to del.icio.us Digg this Add to reddit

snail feedback (2) :


reader notalotofpeopleknowthat said...

My analysis of this summer in Texas suggests it was no "hotter" than other hot years in the 20thC such as 1980. What made the average temperature hotter was that it lasted longer.

My understanding was that the Moscow heatwave was also the result of a blocking weather pattern.

Maybe Hansen could put his skills to work on explaining why such patterns occur instead of blaming everything on CO2.

http://notalotofpeopleknowthat.wordpress.com/2011/10/17/texas-summer-2011how-hot-was-it-really/


reader russedav said...

Good article. Those who demand the "consensus" of science be the voice of science would have muzzled most if not all of the great scientists. If Galileo had followed that nonsense we'd still be demanding belief in the sun revolving around the earth (geokinetic, not heliocentric, as some stupidly claim) and if Pasteur & Lister had obeyed these deranged, corrupt, lying fascists we'd still be promoting sepsis and having most patients die from the microbial infection their unpopular discoveries discovered that saved countless, something paid whoring useful idiots like Hansen would have prevented.