Monday, December 31, 2012

Prediction isn't the right method to learn about the past

Happy New Year 2013 = 33 * 61!

The last day of the year is a natural moment for a blog entry about time. At various moments, I wanted to write about the things that the year 2012 brought us.

The most important event in science was the discovery of the \(126\GeV\) Higgs boson (something that made me $500 richer but that's of course the least important consequence of the discovery) but those of us who were following the events and thinking about them rationally have known about the \(126\GeV\) Higgs boson since December 2011.

Lots of other generic popular science sources recall the landing of Curiosity and other things. But let's discuss something else. Something related to time.

Cara Santa Maria of The Huffington Post (I thought that Santa Maria was a ship, not a car) posted an article about the arrow of time and embedded the following video interview with Sean Carroll.

Clearly, he hasn't learned or understood anything at all over those years. Maybe it is difficult to get a man to understand something when his job depends on not understanding it. ;-) Once again, we hear that the hottest thing in cosmology is the fact that the early Universe had a low entropy (in reality, it really follows from a defining property of the entropy which has been known from the first moment when entropy was introduced in the 19th century).

The picture with the most concentrated wrongness appears around 2:24 in the video above:

Starting from the dot at the "present", Carroll proposes to predict the future and to "predict the past" [sic]. In both cases, the entropy increases relatively to the entropy of the present state.

A very similar picture appears in Brian Greene's book The Fabric of the Cosmos. Brian's picture is even worse because he suggests that the graph of the entropy is smooth, like \(S=(t-t_0)^2\), so its derivative vanishes at \(t=t_0\). It surely has no reason to vanish. Moreover, Brian omits the helpful part of the graph "actual past".

Now, look at the picture again. You see that Carroll "predicts the past" but his "prediction" for the entropy completely and severely disagrees with the "actual past" (whatever is the way how he determined that the entropy was "actually" lower in the past, he wasn't able to derive this elementary fact because his derivation led to the wrong result "predicted past"; he must have some above-the-science method to find the right answers without science even when his scientific methods produce wrong predictions).

Prague clearly resembled a military front again last night.

In science, when your prediction disagrees with the facts, you must abandon your theory. Instead, Sean Carroll just doesn't care. He isn't thinking as a scientist at all. The disagreement between his predictive framework and the empirical fact means nothing for him; he just continues to use and promote his wrong predictive framework, nevertheless.

It's easy to see why his "prediction" of the past is wrong. The reason is that he is using the same method – prediction – that we use to predict the future. He thinks about the past in the same way as if it were the future. However, the very term
"prediction of the past"
is a logical oxymoron. It is exactly as inconsistent a sequence of words as
"sweeten your tea by adding lemon".
You just can't make your tea any sweeter by adding lemon! Instead, you need sugar, stupid. In the same way, it is wrong to use the particular method of "prediction" when you want to say/guess/reconstruct/determine something about the past. The method of "prediction" is, by definition, only good for learning something about the moment \(t_2\) out of the data about the physical system at time \(t_1\) when \(t_2\gt t_1\): you may only predict a later moment (a moment in the future, if we talk about predictions that are being made now) out of an earlier one, not vice versa!

All successfully verified predictions in science – where we use the usual methodology of predictions – satisfy this property that the predicted moment occurs later than the moment(s) at which some facts are known and inserted as input to the problem. If you use the methodology in the opposite way, it just doesn't work! This method of determining the past is as wrong as an attempt to sweeten your tea by lemon. The wrong graph of the entropy in the past on the picture above is the easiest – and a rather universal – way to see that the methodology doesn't work for "predictions of the past".

Instead, if you want to say something valid about the past, you need to use a different methodology: retrodiction. But retrodictions obey completely different rules than predictions. Predictions produce objective values of probabilities of future events out of known facts about the past; in this sense, predictions "emulate" what Nature Herself is doing when She actually decides what to do with the world at a later moment out of the state at an earlier moment, when She is evolving the world. On the other hand, retrodictions can never produce any objective probabilities at all. The reason is that retrodictions are a form of Bayesian inference

Bayesian inference is a method to update our opinions about the probability of a hypothesis once we see some new evidence. Now, the state (or a statement about some properties) of the physical system in the past is an example of a "hypothesis" and the data collected now (at a later moment) are an example of the "evidence".

What's important is that the Bayesian inference is a "reverse process" or a solution to an "inverse problem". The straightforward calculation starts from a hypothesis (an initial state is a part of a hypothesis about evolution) and this hypothesis predicts objective probabilities for the later moment, for the future, if you wish. These probabilities are objectively calculable because the future literally evolves out of the earlier moment (the past).

But it is not guaranteed that you may revert this evolution – or this reasoning. And indeed, in general, you can't. In fact, in statistical physics, you can't. And in quantum physics, you can't do it, either. The reason is that whenever you discuss the fate of any facts or measurements that may only be predicted statistically – and it is true both in quantum mechanics as well as in statistical physics (even in classical statistical physics) – things are simply irreversible.

If you start with a hot tea on the table, you may predict when the tea-desk temperature difference drops below 1 Celsius degree. However, if you start with a tea that is as cold as the desk, you can't say when it was 60 °C hot. This problem simply has no unique solution because the evolution isn't one-to-one, it isn't reversible. Whatever is the moment when the tea is boiling and poured to the cup, it will ultimately end up as a cold tea.

People such as Sean Carroll or Brian Greene correctly notice that the microscopic laws of Nature are time-reversal-invariant (more precisely, CPT-invariant if we want to include subtle asymmetries of the weak nuclear force) but they're overinterpreting or misinterpreting this fact. This symmetry doesn't mean that every statement about the future and past may be simply reverted upside down. It only means that the microscopic evolution of particular microstates – pure states – to particular other microstates – pure states – may be reverted.

But no probabilistic statements may actually be reverted in this naive way. They can't be reverted for the same reason why \(A\Rightarrow B\) is inequivalent to the logical proposition \(B\Rightarrow A\). The laws of Nature imply facts of the type \({\rm Past}\Rightarrow{\rm Future}\) but these facts can't be translated to \({\rm Future}\Rightarrow{\rm Past}\) because you would have to check all other conceivable initial states in the past and prove that all of them imply something about the future (i.e. evolve to states in the future that still obey a certain special condition) – which is virtually never the case. The past and the future play asymmetric roles in mathematical logic because of the \(A\)-\(B\) asymmetry of the logical proposition \(A\Rightarrow B\), the implication.

To deal with the microstates only – for which the time-reversal symmetry holds – means to deal with equivalences \(A\Leftrightarrow B\) only. But this template doesn't allow us to make any realistic statements about physics because the pure states "equivalent" to some states in the past (the future states that evolve from them) are complicated probabilistic superpositions or mixtures that can't be measured. Whenever we make some measurement, we need to talk about microstates that aren't inequivalent to some natural states/information at an earlier moment which is why we need the statements of the type \(A\Rightarrow B\) almost all the time and these implications simply violate the \(A\)-\(B\) symmetry.

In particular, if you fail to specify the precise coordinates and velocities of all atoms in your tea, or if you're talking about a large/nonzero entropy of your tea at all, then you are clearly not talking about a particular microstate. You are only talking about some ensembles of operationally indistinguishable microstates (which is why the entropy is nonzero) or, equivalently, about partial, probably macroscopic properties of your tea. And statements of this sort – for example all statements about the entropy of the tea or the tea-desk temperature difference – simply refuse to be time-reversal-invariant! Lots of friction forces, viscosity, diffusion, and other first-time-derivative terms breaking the time reversal symmetry inevitably emerge in the effective laws controlling these quantities and propositions. All the laws that govern the macroscopic quantities average and/or sum over the microstates and the right way to do so inevitably breaks the past-future symmetry "maximally". For example (and it is the most important example), the entropy-decreasing processes are exponentially less likely than their time-reversed partners that increase the entropy.

As I have emphasized many times, the asymmetry arises because the calculated probabilities must be averaged over the initial microstates but summed over the final microstates. Averaging and summing isn't quite the same thing and this difference is what favors the higher-entropy final states.

There is one more consequence I have emphasized less often. The averaging (over initial state) requires "weights". If you have a finite number \(N\) of microstates, you may assign the weights \(p_i=1/N\) to each of them. However, it's not necessarily the choice you want to make or believe. There may exist evidence that the actual probabilities of initial microstates \(p_i\) – the prior probabilities – are not equal to each other. The only thing that will hold is\[

\sum_i p_i = 1.

\] The possible initial microstates differ, at least in principle. You may accumulate evidence \(E\) – it means a logical proposition you know to be true because you just observed something that proves it – which will force you to change your beliefs about the probabilities of possible initial states according to Bayes' theorem:\[

P(H_i|E) = \frac{P(H_i)\cdot P(E|H_i)}{P(E)}

\] The vertical line means "given". So the probability of the \(i\)-th hypothesis (the hypothesis that the initial state was the \(i\)-th state) given the evidence (which means "after the evidence was taken into account") is equal to the prior probability \(P(H_i)\) of the initial state (the probability believed before the evidence was taken into account) multiplied by the probability that the just observed evidence \(E\) occurs according to the hypothesis \(H_i\) and divided by the normalization factor \(P(E)\), the "marginal likelihood", which must be chosen so that the total probability of all mutually excluding hypotheses remains equal to one:\[

\sum_i P(H_i|E) = \sum_i \frac{P(H_i)\cdot P(E|H_i)}{P(E)} = 1.

\] Note that \(P(H_i|E)\) and \(P(E|H_i)\) aren't the same thing (another potential critical mistake that the people believing in a naive "time reversal symmetry" are probably making all the time as well) but they're proportional to each other. The hypothesis (initial microstate) for which the observed evidence is more likely becomes more likely by itself; the initial states that imply that the evidence (known to be true) cannot occur at all are excluded.

A particular observer has collected certain kinds of evidence \(E_j\) and he has some subjective knowledge which determines \(P(H_i|E_{\rm all})\). It's important that these probabilities of the hypotheses are subjective, they depend on the evidence that a particular observer has accumulated and labeled trustworthy and legitimate. They become prior probabilities when a new piece of evidence emerges. And indeed, one of the most notorious properties of the prior probabilities is that they are totally subjective and there's no way for everyone to agree about the "right priors". There aren't any objective "right priors".

Except for the Czechoslovak communist malls, Priors, which had to be believed to be objectively right. However, Prior is an acronym for "Přijdeš rychle i odejdeš rychle" (You quickly arrive as well as quickly depart) which quantified the product selection.

That's why the retrodicted probabilities of initial states \(p_i=P(H_i)\) always depend on some subjective choices. What we think about the past inevitably depends on other things we have learned about the past. This is a totally new property of retrodictions that doesn't exist for predictions. Predictions may be probabilistic (and in quantum mechanics and statistical physics, they are inevitably "just" probabilistic) but the predicted probabilities are objectively calculable for certain input data. The formulae that objectively determine these probabilities are known as the laws of physics. But the retrodicted probabilities of the past are not only probabilistic; their values inevitably depend on the subjective knowledge, too!

Of course, when the past is determined by the correct method – the method of retrodictions which is a form of Bayesian inference – we will find out that the lower-entropy states are exponentially favored. We won't be able to become certain about any property of the Universe in the past but some most universal facts such as the increasing entropy will of course follow from this Bayesian inference. In particular, the correctly "retrodicted past entropy" will more or less coincide with the "actual past" curve.

I think that even the laymen implicitly know how to reconstruct the past. They know that it's a "reverse problem" of a sort and they secretly use the Bayes theorem even if they don't know the Bayes formula and other pieces of mathematics. They are aware of the fact that the tea-desk temperature difference was higher in the past exactly because this difference is decreasing with time. More generally, they know that the entropy was lower in the past exactly because the entropy is increasing, was increasing, and will be increasing with time. They know that determining the past by the same logic by which we predict or expect the future is wrong, stupid, and it contradicts common sense.

Too bad that Sean Carroll hasn't been able to get this basic piece of common sense yet, after a decade of futile attempts to understand the basics of statistical physics.

And that's the memo.


  1. Happy New Year, Lubos!
    Všechno nejlepší v novém roce! :)
    Thanks for the post, it is a wonderful gift!

  2. Happy new year Lumo and to the whole TRF community :-D

  3. ... and thanks a lot for this nice end of the year article :-)

    But Priors are always objectively right by definition, see :-P, see:

  4. Doncha just love Soviet-era brutalist architecture!

    Happy New Year lazy Lubos (too lazy to type out all three prime factors of 2013!)

  5. There are simple situations in which one can make retodictions successfully. The best example is orbital and planetary mechanics. Some of the ancient Egyptian monuments have a stellar alignment that is incorrect today because of the precession of the equinox. Even in that case if one tried to go too far back in time chaotic dynamics would get you. Your main point still holds since one is assuming an initial condition.

  6. Happy new year to Lubos and all the followers of this blog ! We were looking for a not so politically correct science blog ( physics in particular ), a friend of ours ( ex maths student ) recommended your blog ! It`s good to see a Physicist with balls to say what he thinks without fear of being perceived politically incorrect ! You know what I meant ! :)

  7. Happy New Year!

    't Hooft probably has the earliest correct "announcement" of the Higgs Particle mass from this 2001(!) interview

    [quote]In fact, most of us are convinced that the
    observation of the Higgs particle is just around the corner. In fact, you
    may have heard the rumor that at CERN they were just about to make
    the discovery but unfortunately the machine had to be shut down. There
    is going to be a more powerful machine there. We just keep our fingers
    crossed that they were probably right and the mass of the Higgs is around
    125 or so GeV. If not, it might then be a little bit heavier but even then
    it will be detected fairly soon, say, within about five to ten years.[/quote]

    From: Candid Science IV - Conversations with Famous Physicists p. 123

    ( google "candid science iv djvu" )

  8. Happy New Year, Lubošet al !!!

  9. Happy New Year! :)

  10. brothersmartmouthJan 1, 2013, 4:51:00 AM

    Time doesn't scatter the papers on my desk, I do. And it takes less effort to reorganize them. As a verified layman, this seems like a bad analogy. Is energy just lost into space?
    Happy New Year! Keep it up Mr. Pilsen, and a predictably informative 2013.
    A new years question that I need to know,
    Will our universe eventually end up at absolute zero forever?

  11. It's not laziness, it's hard work I did to simplify the material for the readers as much as possible because it's surely easier to remember 2 factors and "discover" that 33 = 3*11 than to remember 3 factors and feel that all the discoveries have been scooped by others. ;-) Happy New Year, LM

  12. Right, celestial dynamics and especially planetary orbits is "reversible" in this sense so the retrodictions ultimately end up being fully analogous to predictions.

    The reason is that we deal with "complete information" about the relevant degrees of freedom (except for the limited precision; but all "qualitative" relevant pieces of information are known). I was talking about a more general or generic case.

  13. This prediction, especially with the right Higgs mass, surely sounds as a prophesy but all the evidence I see - correct me if I am missing something - indicates that he was just guessing or choosing a reasonably low number that was still far enough from the exclusion limits of that time.

    Moreover, the estimate of the discovery date wasn't right because it was 11, not 5-10, years away. ;-)

  14. t'Hooft was probably talking about the ALEPH 115GeV excess events in the Higgs search

    which were not confirmed by the other three experiments.

    There was not enough statistics and if LEP2 had continued maybe the HIggs would have been found then..

  15. Happy New Year to all, and let entropy increase :).

  16. Except that he apparently said the correct 125, not 115, GeV. ;-)

  17. Marcel van VelzenJan 1, 2013, 12:07:00 PM

    Happy New Year Lubos,

    "If you knew everything about you and the universe then the future would be clear to you" WHAT??? I thought quantum mechanics was about non commuting operators and probabilities of things happening?

  18. Happy new year Lubos. I do not pretend to understand a fraction of your posts but I look forward to the bits I do with relish.

  19. Maybe he accidentally said 125 instead of 115 or maybe the interviewer misheard him, or maybe...

    't Hooft suggesting the mass at around 125GeV in 2001 sealed its fate in some kind of weird superdeterministic cellular automatasic fashion. :-)

  20. thejollygreenmanJan 1, 2013, 3:25:00 PM

    All the best for 2013 Squire!

    May you have a rich harvest from the tree of knowledge that evidently grows in your back garden.

  21. there is something i don't understand if someone could answer. when you try to predict the past if you have all the necessary information it can't get predicted? it could be predicted if you know all the changes that happened in the system, right?

  22. Bonne et heureuse Année à tous sur TRF.

    And if you can't sweeten your tea with lemon then pour rum in it... great if you have a cold ;-) (it also works without the tea)

  23. Hi Luboš.

    The sentence "sweeten your tea by adding lemon" is not an oxymoron, if you have previously eaten the Synsepalum dulcificum

  24. LOL, fun plant. But I would say that you only sweeten that tea once you pour it into your mouth i.e. you sweeten it by drinking it. ;-)

  25. In the same volume an impressive prediction from a September 2000 interview from one of the most remarkable theoretical physicists ever (Yuval Ne'eman - read the entire interview to understand why he is so remarkable),

    They are waiting for the new
    accelerator, the LHC, which will be completed in 2005. My model predicts that the mass of the Higgs is twice the mass of the W, which is 85 GeV, so it will be 170 GeV. However, there is a renormalization correction because at high energy the mass is a function of energy. The energy at which this Pythagorean result holds is a higher energy. We calculated the mass of the Higgs at lower energy and it comes out as 130 ± 10 GeV. My prediction is the only prediction in the field. No other theory says anything about
    the Higgs — except for ordinary supersymmetry, which then requires the existence of lots of new particles.