Sunday, August 15, 2010

Paper: Fake random data are better predictors than Mann proxies

One of the examples showing that the existing composition of the climate science community makes it impossible to achieve "collective" progress even in the elementary scientific questions is the hockey stick controversy.

It was almost a decade ago when it was understood - mostly by Ross McKitrick and Steve McIntyre - that the MBH (Mann-Bradley-Hughes) methodology is flawed because it assigns a higher weight to proxies and their time series that do exhibit a greater overall change (warming) in the instrumental period (20th century) than in the previous centuries.

Why is it so?

In a big enough ensemble, a fraction of the proxies (e.g. tree ring widths) inevitably possesses this property (having a bigger 1900-2000 change than e.g. 1700-1800 change) by chance, even though they're not correlated with the temperature. Because of this property, their voices are being "amplified" and they contribute to the "reconstructed temperature" according to the MBH methodology (because, according to Mann, this methodology "calculates" that such proxies are better because they are "more correlated with the thermometer readings"). And their average may be seen to look like a hockey stick.

It means that the hockey stick graph is generated even from the random data, as long as they are continuous, e.g. as in red (Brownian, i.e. random walk) noise.

Some time ago, I have provided the TRF readers with a simple Mathematica notebook that  produces such a nearly perfect hockey stick out of the random data:
Mann fun notebook

Mann fun printed to a PDF
Steve McIntyre was kind enough to claim that your humble correspondent, together with McIntyre, David Stockwell, and Jeff Id, discovered this phenomenon independently (click and search for my name).

I think he's being too generous here - and not only here. I have only "rediscovered" the phenomenon because I knew what the result was. And I knew it because it was written in the McKitrick-McIntyre paper, even if the "localized explanation" why the methodology misbehaves in this way may have been missing or too cryptic - lost in between many other criticisms, usually those focusing on individual trees and other minor points.

Temperature "reconstruction" obtained from 5,000 "trees" with random-walk records that are completely unrelated to temperatures. The x-axis depicts the years 1000-2000. The behavior of the "trees" in the last 100 years was compared with a linearly increasing "thermometer" function to give the proxies a weight.

At any rate, it is damn obvious that the methodology is incorrect and its "hockey stick" conclusion about the historical temperatures is unsupported by the actual data. It may be right, more likely, it is wrong, but Mann et al. have surely found no positive evidence that it is right.

Why are there still people, including people employed as climate scientists, who claim that they don't realize that the Mann methodology is flawed? They're clearly not up to their job. Could someone please finally fire them? One of the reasons why some laymen may fail to get the point is that they just don't understand the technology, not even the simple argument above. They're waiting for journals.

Well, a prestigious statistics journal, Annals of Applied Statistics, has just scheduled a paper for publication in the next issue. The paper has the same content:
McShane and Wyner: A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable? (full text PDF, backup, Google Docs Viewer as a HTML)
McShane and Wyner 2010 has 2.5 MB and 45 pages in this format and it shows that the MBH-selected proxies are actually leading to worse predictions than their fake data. The Mann proxies won't score well in their match against more sophisticated null hypotheses. You will find explicit sentences that you have seen written on this blog many times, for example
That is, the very method used in Mann et al. (1998) guarantees the [hockey stick] shape of Figure 1. (Page 5)
Well, this point has been known at least for 7 years but maybe, some of the people suddenly start to take notice when papers with this no-longer-original finding start to appear in conventional peer-reviewed journals. It's sad that science has to go through these irrational political battles when nasty folks and aggressive crackpots of Mann's type - who should spend their lives in jail - may prevent knowledge from disseminating for 7 years. But even in the real world, the lies have short legs, after all. They can't get too far.

Hat tip: Steve McIntyre's and Anthony Watts' blog


  1. I think it's time to revive and deploy the age-old witticism that I used to read (and, I admit, occasionally write) on the walls of my elementary school bathroom:

    "Michael Mann je vůl!"

    Michael J. Kubat

  2. What does this imply about the co2 reconstuctions?

  3. So, Lubos.
    how do you account for 'hockey sticks' in proxies unrelated to tree-rings, such as borehole records, glacier length records, sediment records from fluvial systems etc? Seems that present climate is different from at least the last 1000 years after all!

  4. Dear Michael Kubat,

    wow, the pupils (or teachers?) on that school must be engaged in science/politics... ;-)

    Dear Aaron,

    this statistical finding is about a lethal bug in the mathematical method. This bug screws any analysis where the methodology is used.

    So if you allowed Michael Mann to reconstruct the CO2 concentration or the Capitol in D.C., they would be screwed, too. I don't think that he's been allowed to do these things. ;-)

    Dear San,

    the same comment. You seem to misunderstand that the flaw of Mann's methodology has nothing to do with the proxies' being trees or anything else that is specific.

    No group of proxies shows any unprecedented temperature in the 20th century and those that do are doing so by chance. Proxies are bound to show lots of noise unrelated to the temperature and some of them will inevitably show what you like - and given your exclamation marks, it's fair to say that you are personally obsessed with the bogus claim that something is unusual about the 20th century climate.

    But the Mann methodology isn't just about realizing that some of the trees or boreholes may show what you like; it is a method that systematically picks the hockey stick shapes.


  5. I have to admit I'm a bit perplexed by all this proxy mumbo-jumbo. Even if one had solid empirical evidence that the last hundred years or so are an anomaly within the last thousand years of climate, how can one possibly extrapolate from this that it is an anomaly within our billions of years of climate history? Seems to me that any proxy argument that only goes back a thousand years is useless anyway.

  6. This is a much more complex version of a similar phenomenon I experienced in writing a paper for an econ course in college. I ran regression of employment data vs investment in basic industries for a region. r-squared was a virtually impossible .99 - a perfect fit. In reality, I had misplaced a decimal point on one of my punch cards, and it happened to be on a data point with a high value for the dependent variable. In other words, my mistake made all the data look like background noise except for the one data point that I "predicted" so accurately. As the saying goes, garbage in-garbage out. Or in this case, it's more like, if your statistical technique have the finesse of a garbage disposal you're going to get a sloppy meaningless mess on the other side no matter what you run through it.