Thursday, January 31, 2013

Czech temperature record 1961-2012

Because Frank E. L. asked me about related issues, I have downloaded all the monthly/regional Czech Hydrometeorological Institute's temperature data from the years 1961-2012 to Mathematica and calculated all statistical quantities I considered interesting.
Here is the PDF preview of the notebook.
Click at the red tile. Let me describe what I have found.

First, I had to figure out the right URLs of the HTML pages that contain the tables in an importable enough format, import them with some sensible formatting options to Mathematica so that they quickly become an array, and replace decimal commas by decimal points.

When we're done, we have the temperature for each of the 12 months of each of the 52 years for each of the 1+13 regions of the Czech Republic. Just to be sure, Czechia has 14 administrative regions (the division was different in the era of communism and it's not one of the things in which I considered communism to be worse: I don't really care) but in this dataset, Prague is unified with the Central Bohemian region that surrounds the capital which means that we only have 13 "weather regions" but they're sometimes supplemented by the new 14th region (put at the beginning) which is the whole country.

Well, your humble correspondent rewrote the names of the regions with the right character set and diacritics. It still displayed incorrectly in the PDF and most readers can't read Czech, anyway. ;-)

Now, I have checked the consistency of the data – the web pages show the actual monthly temperature for each month/year/region combination, the normal temperature for this month/region combination (that should be independent of the year), and their difference, the temperature anomaly.

The "normal temperatures" were listed identically almost everywhere. The only mistake appeared in January 1961 (the first year) for the Southern Bohemian region which says +2.8 although all other years agree it should be –2.8 – all these numbers are in Celsius degrees. So I suppose +2.8 is a typo which also makes the anomaly for that single month/region combination incorrect.

Then I verified whether their anomalies are really equal to the differences of the actual temperatures and the the normal temperatures. It turned out to be exact almost everywhere. However, there were 49 month/year/region combinations where the difference was plus or minus 0.1 Celsius degree, probably due to rounding. Curiously enough, all these mismatches (a tiny minority of month/year/region combinations: such rounding errors should have occurred in 1/2 of the data i.e. in thousands of combinations if the algorithm capable of producing rounding errors were used everywhere) appeared in the years 1974, 1981, 2003, 2005, and 2012. One may see some inconsistency in the rounding schemes – it was probably done by different people or at different moments for different years.

At any rate, the actual temperatures seem trustworthy enough. I haven't verified whether the tables did the correct averaging over the years and the correct averaging over the 13 weather regions – some new rounding issues could be found and discussed here as well, I guess.

So I calculated my own normal temperatures. For each region, they're a pretty nice sine that goes from –4 through –2 °C in January to +15 through +18 °C in July. You may see that the differences between the regions are slightly larger in the spring and the summer than they are in the fall or the winter.

The maximum temperature anomaly for a month/year/region combination was 6.3 degrees. The root mean square anomaly was 1.93 °C. Yes, if you average the temperature over a month, they give you a pretty nice Gaussian centered at the "normal temperature" whose standard deviation is almost two degrees. That's true in the Czech Republic. The numbers may differ in your country or region. The fluctuations are likely to be smaller near the sea and larger deep inside the continents.

When I divided the anomalies to individual years, the histograms were much less Gaussian and more "noisy". In different years, the root mean square anomaly went from 1.2 to 3.0 °C or so. When I divided the anomalies to the 13 or 14 individual regions, the Gaussians were a bit smoother and all the root mean square values of the anomalies were between 1.85 and 2.05 °C.

Now, the trends. Note that when we calculate the trends, the year variable "drops out" because it's already been used to calculate the trend. So we only have at most 12*14 = 168 month/region combinations for which the trend may be computed. The mean trend is 2.8 °C per century – Czechia has obviously seen a faster rate than the globe – the latter had close to 1 °C in the last 50 years. The histogram is somewhat noisy and almost all the entries show a trend between 0 °C and 6°C.

When the trends are divided to the individual 14 regions, they are all between 2.45 and 3.15 °C per century – we still have the degeneracy over the months. All these differences between the regions may be described as noise.

However, a shocking and a kind of curious observation is that the temperature trend vastly depends on the month – from January to December – if all the regions are clumped. It changes in a zig-zag way. The trend for Octobers is close to 0.2 °C per century, virtually zero, while the trend for Mays and Augusts exceeds 4.3 °C per century.

Do you have an explanation for this zigzag behavior?

A possible explanation you may suggest is that due to leap years, Januaries of different years are "different parts of the year", and similarly for the other 11 months. However, this explanation seems to produce far too weak an effect. At most, the shift of a month is by half a day in one way or another. Seasonally, half a day only makes 0.1 °C of a difference and a vast majority of these effects gets averaged out, anyway, because among the 52 years, some of them will have excesses and some of them won't.

So we are forced to conclude that there's been no climate change in Octobers, almost no climate change in Septembers, and a negligible climate change in Februaries. ;-) Does a similar pattern exist at other places? Do you have an explanation of this strong dependence on the month?

One more hint: it could have something to do with other drivers such as aerosols. Maybe the no-trend months – October, September, February – are the worst smog months and smog has contributed a negative amount to the warming trend which only seems to operate when there's actually smog? Do you have a better idea? What should be the differences expected from a sensible statistical/weather model you would think of?


  1. well, it would be consistent with a progressive smaller albedo in the spring summer, less and less clouds.

  2. You previously reported temperature trends for Austin, Texas, and, for the past 73 years or so, found a decrease for the winter months but a increase during the spring and summer with no significant overall trend (a slight cooling, actually).

    This may be a related effect and I can think of no more likely cause than aerosols.

  3. Dear Gene, if it's due to your pure memory, not enhanced by searches ;-), then your memory is amazing.

    Yup, Austin was where I previously looked!

    The graphs for different dates are also rather chaotic but don't seem to be too similar to the Czech ones as a function of the month (or date), unless I overlook some pattern which isn't obviously to me.

    The general lesson that the trends are highly date-dependent hold. This invalidates models in which one would try to imagine that the temperature on date YYYY/MM/DD is a function of YYYY (with a slow trend) and (plus) a function of MM and DD (seasonal plus noise) because I believe that this very assumption would imply a smaller dependence of the trend on the date. In other words, it excludes almost all models I can sensibly think of. ;-)

    One might perhaps say that all the trends one extracts for the dates are noise - including their average that seems substantially nonzero.

  4. Yes, I was being a bit lazy. I agree that all the data are pretty chaotic but that’s the nature of weather, isn’t it?

  5. Dear Gene, at a very general level, I agree. But even if chaos seemingly allows "everything", I can't imagine a good model of chaos that would allow this dependence on the date. It implies that high-frequency (date-dependent) changes matter - but these high-frequency changes still can't be described as something whose character is independent of the year.

    Right, that's a way to formulate it: the weather seems to exhibit very strong correlations between dates and years, between the YYYY and MM/DD part of the information. It's needed for this surprising outcome.

  6. Lubos,

    Maybe this chart will help. It's a graph of the average difference of today's temp rise - tonight's drop x 100 for all weather stations North of 23 Lat from 1950 to 2010.
    Spring and fall are near the peak positive and negative differences. I'm not sure how it relates to your mystery, but maybe it will spark something in your mind.

  7. These "actual temperatures for the month" would be calculated how? The arithmetic mean o minimum and maximum temperature extremes measured on every day?

    If you have access to the data for the original extremes, you may be able to detect a change in instrumentation. This resulted in the time constant of the "thermometer" changed from 10's of seconds to less than a second; in conjunction with a change in enclosures where the original internal volume of "bufferred" air was around one cubic metre; where it's now not even 10% of that; and the enclosures are much more sensitive to where they are mounted as well as increased sensitivity to winds.

    The assumption that such an arrangement will result in "a similar" average appears to have no technical basis. WMO has left the "adjustment" to account for the differences in measurement methods up to the climatologists to sort out. There seems to have been no effort made to adjust the modern data so that the time constants are even somewhat similar to the traditional mercury thermometer(s) in a large, screened enclosure.

    All analysis is only as good as the data.

  8. I can venture a hypothesis for this behaviour.

    The 2 parameters that impose the temperature at first order are the solar irradiance and the cloudiness.

    If the cloudiness was constant, one would expect the monthly variability on an inversed U curve. Indeed in winter there is little irradiance and high albedo so the cloudiness doesn't matter much. In summer it is the contrary so the cloudiness dictates the temperatures.

    As the variability doesn't seem to be on an inversed U curve, that means that the cloudiness is not constant but varies much with the month.

    As there are no data for cloudiness one would need at least the atmospheric pressure data as proxy for cloudiness (e.g high pressure = low cloudiness and low pressure = high cloudiness).

    If my hypothesis is approximately correct, then one should observe a very different monthly behaviour for atmospheric pressure.

    For example a trend from low to high in May (and also a high variability) while no trend (and low variability) in october. The seasonal variations of atmospheric pressure are a complicated matter. For instance there are some marked periodic features (like sustained lows in fall) and some periods (generally spring) which are completely chaotic. These observations of long term statistics are often contained in popular wisdom like "Časné jaro - mnoho vody, jarní deště - mnoho škody." atp.

    Of course this behaviour should also vary strongly with latitude - markedly so for higher latitudes and less or not at all for lower latitudes.

  9. Dear Tom, an interesting hypothesis. If I fix your conjecture, you predict that the trends should be greater for months that have a higher potential for the cloudiness on that month to vary, i.e. months when cloudiness is close to 50 percent, right?

    The trends are so chaotic functions of the month that I am not sure what it means to test such a hypothesis. Note that Octobers - the cloudy months, I would say - have seen almost no trend. So I suppose that in your picture it's because the cloudiness is close to "maximized" in all years.

  10. Yes it is approximately that Lubos.
    The part that is not a conjecture is that the cloudiness modulates the temperatures at first order. F.ex a sunny august day in Chotebor will reach 30°C while a cloudy rainy day in the same august will be at 15°C.
    On the contrary a sunny day in december will be at -5°C while a cloudy day in the same december will be at a similar temperature.
    From that follows that cloudiness doesn't matter (much) when there is little sun anyway (e.g winter) while it is the leading variable when there is much sun (e.g summer).
    This observation explains why the temperature variability (sigma) would be high in summer and low in winter.
    The part that is a conjecture is that cloudiness was not constant in time for a fixed month but presents a trend.
    For instance if there is a trend in May with less cloudiness today than 100 years ago, then the consequence would be a "trend" in temperatures but its causal explanation would be the variation of cloudiness.
    To verify the conjecture one would need data on cloudiness which are not available.
    That's why one could use atmospheric pressure as proxy and if pressures are available, verify whether the average monthly pressure exhibits the same kind of asymetric behaviour like the temperatures do.
    If yes, then the tempartures are explained - it was the clouds (it still doesn't explain why the clouds would do what they did but the conjecture makes sense).
    If not then the conjecture is falsified and we are left with an enigma (saisonal smog?)