## Thursday, May 22, 2008

### SciAm prints Sean Carroll's fragmented pottery

At the beginning of the 21st century, fragmented pots are immensenly popular with the media. The old-fashioned discrimination of wrong ideas by the correct and promising ones, also known as the scientific method, has to go. There is urge to undo all this "injustice", previously known as science. Positive discrimination must be put firmly in place, most journalists think.

Thermodynamics and statistical physics

When it comes to thermodynamics and statistical physics, Sean Carroll is indisputably a good example of a fragmented pot. That's probably why the Scientific American magazine published his incredibly ill-informed piece:
Does Time Run Backward in Other Universes?

One of the most basic facts of life is that the future looks different from the past. But on a grand cosmological scale, they may look the same
Now, we have already seen his crazy statement that cosmology is behind the laws of thermodynamics. In this context, however, he has brought his basic ignorance to a completely new level of insanity. Not only cosmology is supposed to be the driver behind the thermodynamical phenomena. In fact, the regime where the difference between the past and the future is supposed to disappear is the "grand cosmological scale" itself!

You should probably imagine cosmic deflation initiated by re-cooling and supernovae that are unexploding, sucking huge amounts of photons from the environment directly to their center, or - even more dramatically - black holes (OK, "white holes" because Sean Carroll also misunderstands that it's the same thing at the quantum level, Hawking 1975, 600+ citations) that are spontaneously decaying into pairs of black holes or black holes that smoothly become stars where dead complex animals are resurrected, begin to get younger, and end their evolutionary journey as microorganisms. ;-)

The previous paragraph is not an exaggerated joke: Sean Carroll wrote a blog article promoting his SciAm work whose only content is the statement that the processes we mentioned are "real" in some "other Universes". He even thinks that there is a mystery why it's different in ours!

Do the concepts and propositions of thermodynamics, including the second law, break down in the cosmological regime, as Sean Carroll tells us?

As every undergraduate student of thermodynamics and statistical physics knows or should know, just the opposite statement holds in reality. It is the microscopic laws, directly relevant for the very small, simple, and fully described states of matter, where the difference between the past and the future is absent (or where the CPT theorem holds, to say the least). In all greater systems, namely systems with nonzero entropy, the difference between the past and the future is inevitable.

In a sharp contradiction with Carroll's assertions, the greater systems you consider, the more dramatic the difference between the past and the future you experience (or you can derive). This difference is obviously maximized when you consider the whole Universe.

Why is it so? Well, some of the students mentioned above may still remember the notion of thermodynamic limit of statistical mechanics in which the number of particles (more precisely their entropy) goes to infinity.

In this limit, the equations of statistical physics simplify, the effect of conventions and detailed assumptions (for example, the choice of microcanonical or canonical ensembles) vanishes, and all the important quantities converge to some limiting values. These values are the same values that are also described by other laws of physics known as thermodynamics - approximate laws that don't have to assume (but also cannot say anything about) the atomic character of matter.

In thermodynamics, the symmetry between the past and the future is always broken. All kinds of time-reversal-asymmetric terms, including those that govern friction, diffusion, or decoherence, arise in these effective laws relevant for the thermodynamic limit. Neverthless, these laws can still be derived by the methods of statistical physics applied to the microscopic, fundamental laws (describing the same system we study, not cosmology!).

Normally, I wouldn't believe that a physics PhD could be ignorant about these basic facts. And a publication of nonsense that contradicts them by a magazine that is not supposed to be a completely dumb tabloid would be unthinkable. Clearly, it is not unthinkable in the Scientific American magazine.

The emergence of the past-future asymmetry

The microscopic laws are typically time-reversal-symmetric. In quantum field theory, the T symmetry may be broken (in reality, it is broken by weak effects associated with the mixing of quarks). But as Wolfgang Pauli has proven in his CPT theorem, Lorentz-invariant theories always obey the symmetry with respect to the transformation that combines the time reversal with parity and charge conjugation, namely the CPT conjugation.

OK, I think that all people who write about physics understand this CPT-symmetric starting point. What many people apparently misunderstand is the emergence of the asymmetry between the past and the future. All of this asymmetry can be linked to the second law of thermodynamics that says that the entropy of macroscopic systems is smaller in the past than it is in the future.

Because we have reduced the asymmetry to a statement about entropy, we should understand what entropy is. For our purposes, it is the natural logarithm of the number of microstates "N" that are macroscopically indistinguishable. This logarithm used to be multiplied by Boltzmann's constant, "k", but modern theoretical physicists typically use units of temperature where "k" equals one. Nevertheless, I want this point to be really powerful so let us restore "k" so that we can look at Boltzmann's tomb:

This is a dead giant of physics who has figured out most of these things more than a century ago. The letter "W" represents the volume of the classical phase space but when you correctly relate the concepts of classical physics and quantum physics, it should be replaced by the number of quantum microstates, "N".

In the previous paragraphs, I talked about the "number of states that are indistinguishable". You might say that it depends on your abilities (and conventions) to distinguish. And you would be right. It does depend. But in the thermodynamic limit, the logarithm of such a huge number is actually universal. The details how you define "indistinguishable states" can only influence the entropy of physical systems by subleading corrections that become negligible in comparison with the overall entropy, as long as the entropy itself is large.

Below, I will explain why the entropy always increases. The statement is relevant whenever the entropy is nonzero and it becomes very important and exact whenever the entropy is large. I emphasize that the conclusion - and the laws of thermodynamics - apply to all systems with nonzero entropy, even though the "visual" character of low-entropy states and high-entropy states may depend on the context.

For example, high-entropy states of gases look uniform but high-entropy states in systems where the gravitational force is the king look very non-uniform and the black holes, in fact, maximize the entropy in the gravitational context. But this difference is only "visual" while there are more invariant facts that are universal.

It is always true that there are some microstates, their number determines the entropy, the entropy increases with time, and other thermodynamic laws (about the relationships between temperature, entropy, and heat capacity, among other things) can be derived from the spectrum and dynamics of the underlying microscopic theory, as long as we know how to use the fact that the number of microstates is very high (i.e. to take the thermodynamic limit).

Time-reversal asymmetric toy model

Because the asymmetry between the past and the future doesn't exist when the entropy is zero and when we consider very simple and exactly described microscopic systems, and because the asymmetry is clearly huge for high-entropy systems, it should "emerge" when the entropy is "somewhere in between", so to say - nonzero but comparable to one.

Indeed, it does emerge. Let us see how.

The concept of entropy referred to the notion of "macroscopically indistinguishable states". So we need to be asking questions about a subset of physical concepts only. We need to deal with "incomplete information", if you wish. At the same moment, we want the toy model to be simple enough for its microscopic description to be transparent.

The best thing I can offer you is the evolution of a physical system with one bit of missing information in the past and one bit of missing information in the future. ;-) You might try to invent examples where the information would be smaller than 1 bit but they would be more subtle conceptually.

Higgs-electron scattering

Consider the initial state with one spinless particle, for example a Higgs boson, colliding with a spin-1/2 particle, an electron. The velocities of both particles are known. The polarization of the electron's spin is not: that's the uncertain bit. I deliberately chose the second particle to be spinless in order to reduce the incomplete information to one bit.

These two particles scatter and the final state contains one Higgs boson and one electron, too. The velocities are known, the spin of the electron is not. The latter is the uncertain information that gets vastly expanded if we consider macroscopic systems: it contains all the unknown and/or irrelevant atomic degrees of freedom of the pieces of matter.

Probabilities: computation

Now, what is the probability that the scattering occurs, with the known velocities but unknown or unreported spins? If the spin were absent, we would simply calculate the complex amplitude "M" for the evolution (between the normalized initial and final states) and
P = |M|2
would be the probability. But here we have four different complex scattering amplitudes, depending on the electron's spin,
M = {M fi}, f,i = up or down
Which of them we should square and how should we add them if we're only interested in the question what is the probability of the scattering amplitude with given initial and final velocities but unreported polarizations of the spin?

The key point is this. We must average the squared amplitude over the "N_i = 2" initial states but we must sum it up over final states:
P = (1/Ni) ∑ i,f |M fi|2
If you're a particle physicist who has never studied any thermodynamics but you attend courses of quantum field theory ;-), this is the first formula where the past (=the initial state, by definition) and the future (=the final state, by definition) enter asymmetrically. In fact, all the asymmetry in the world can be reduced to the formula above and its obvious generalizations. So it is important to understand its origin and its consequences in detail.

Consequences

Let me start with the consequences. In our toy example, the number of indistinguishable (or indistinguished) microstates was 2 both in the future and in the past. But we could have clearly considered situations where "N_i" and "N_f" are different. For macroscopic systems, both of these numbers are huge - comparable to Avogadro's number.

You can see that it is "1/N_i", not "1/N_f", that appears as a prefactor (the time-reversed partner of the formulae would have "1/N_f"; it would be a different formula that is however isomorphic as long as you switch your terminology what you mean by the the initial states & the past and by the final states & the future; in physics, we define the initial states and the past to be those that generate the "1/N_i" prefactor).

You could also hypothetically imagine a different world where the prefactor is the geometric average of the two possible prefactors, "1/sqrt(N_i.N_f)". Such a world would be time-reversal symmetric at the macroscopic level. But it not our world. In that world, the rule that the probability of (final states) "A or B" equals the probability of "A" plus the probability of "B" minus the probability of "A and B" (which is zero for mutually exclusive "A, B") would be violated because the square roots don't obey linear laws such as "sqrt(2)+sqrt(3) = sqrt(5)". ;-)

I will explain the origin of the prefactor later. But what does it mean that the prefactor is "1/N_i" and not "1/N_f"? It means that the probability gets (vastly) smaller if the number of indistinguishable initial states is too high (or vast). In other words, the formula implies that the evolution is much more likely if the initial state has a very small number of indistinguishable microstates, i.e. a very small entropy.

On the other hand, the formula doesn't "punish" you for having too many final states: "1/N_f" doesn't appear as a prefactor. Because we still sum over the final states, we get a higher number if we sum over many of them. That's why the final states with a high number of indistinguishable counterparts - with a high-entropy - are favored.

So far, the qualitative comments above have neglected the value of "M_{fi}". Let us imagine that we deal with a basis of microstates in which all matrix elements "M_{fi}" are equal to "m" (up to a phase), a small number, which is of course inaccurate but if gives us reasonable ideas about the scaling of the probability. Then the sum over the initial states and the final states give us simple factors of "N_i" and "N_f", respectively. In other words, the previous displayed formula reduces to
P = (1/Ni) Ni Nf |m|2 = Nf |m|2
In this parameterization, the probability only depends on the number of final states, and not the number of the initial states. The more indistinguishable final states you have, the higher the probability is. Because the number of indistinguishable final states is dictated by the entropy, we have
P = exp(S final) |m|2
The probability increases with the exponential of the final entropy (in the "k=1" units), as long as the complex amplitudes "m" are kept constant and universal. That's why the evolution always favors high-entropy states in the future but it doesn't care about the entropy in the past.

We can parameterize the formula in one more way. Imagine, for example, that "m_{fi}" is the square matrix of the discrete Fourier transform. All of its entries satisfy
|m|2 = 1/N total
where "N_{total}" is the total dimension of the relevant (or effective) Hilbert space which is common for the initial states and the final states. We may see that that
N total = exp(S maximal),
the total effective Hilbert space can store a certain maximum entropy. Then the probability of a particular evolution is
P = exp(S maximal - S f).
Clearly, the evolution is completely dominated by the macroscopic final states of the highest entropy - whose "S_f" is as close to "S_{maximum}" as possible. Let me emphasize once again that the value of the initial entropy, "S_{i}", doesn't matter. Just look at the formula. Anyway, I am certain that dozens of insufficiently bright readers will tell us in the fast comments that "my theory" predicts that the entropy must be maximized at all times.

Depending on the way how we parameterize the time-reversal-symmetric, microscopic matrix elements "m_{fi}", the formula either means that low-entropy initial states are preferred or high-entropy final states are preferred. At any rate, the processes whose final entropy (strongly) exceeds the initial entropy have a (much) higher probability.

Incidentally, if you want a parameterization in which the difference of the entropies appears explicitly, here it is. Write the "universal" value of "|m|^2" as "h / sqrt(N_i N_f)" which is still time-reversal symmetric. Then one of the previous formulae becomes
P = N f |m|2 = sqrt(N f/Ni) h =
= exp((S f - S i)/2) h
where "h" was defined in a time-reversal-symmetric fashion. Nevertheless, you see that the probability explicitly tries to maximize the increase of the entropy. The factor of "1/2" in the exponent may look surprising but it never appears in answers to more specific questions because more specific questions always treat the past and the future asymmetrically.

At this moment, you should understand that all of the observed effects where entropy increases can be reduced to the understanding of the formula above; we will be justifying it later. But I want to say that even before you understand thermodynamics and statistical physics, certain statements may be seen to be incompatible with rational reasoning a priori. In his SciAm article, Sean Carroll writes:
Nevertheless, over the years we have developed a strong intuition for what counts as "natural"—and the universe we see does not qualify.
I apologize but if you develop a certain intuition (or methods or a theory) that implies that most processes in the Universe - including the breaking of eggs, explosions of supernovae, as well as virtually everything else we have observed - are "unnatural" according to your intuition, then your intuition (or methods or a theory) is falsified. And rational scientists couldn't have "developed" such a completely bad intuition by following the scientific method and by taking the observations into account. You could have seen that Carroll's speculations are wrong even before you understood the formula for "P" simply because they contradict almost every observation we have ever made, indeed.

But it is of course not a defect of the laws of physics as understood in the 21st century; it is a fault of Sean Carroll's (mis)understanding of them. Physics is ultimately based on observations so it is able to prove that certain "theories" are just pure bunk and that certain cosmologists are fragmented pots.

Justification of the asymmetric formula for "P"

Now, once we have seen that the formula arising from our toy model is behind the universal increase of entropy in all high-entropy systems in the world and behind the corresponding asymmetry between the past and the future, we should discuss the question of its origin a bit more comprehensively.

So why do we divide the sum of the square amplitudes by the number of initial states, but we don't divide it by the number of the final states?

Let me start with the final states. We are literally asking the question "what is the probability that a process occurs, with the final state being one of several mutually exclusive states?" It is very clear how this probability must be determined. We compute the probabilities of the individual final microstates and add them up. This is not a shocking new rule found by physics but a basic law of logic. If two possible events, "A" and "B", are mutually exclusive, then the probability is additive:
P(A or B) = P(A) + P(B).
So it is clear how to deal with the final states. If we don't care about some details of the final state, we take the probabilities for the individual microstates - and they didn't have anything in the denominator, recall the first simple "|M|^2" formula for the case where both initial and final states are exactly known - and we sum them over the final microstates that are indistinguishable. This result follows from a rule of mathematical logic.

So how is it possible that the factor "1/N_i" is there? Shouldn't our approach to the initial states follow the same formulae? Shouldn't we sum over the initial states instead of taking the average? The answer is a resounding No.

Why? When we were calculating the probabilities involving one of many possible indistinguishable final states, none of them had to occur. Instead, a completely different state - from a different ensemble - could have been chosen. When we're interested in two final states, "Final_1" and "Final_2", it is not yet true that
P(Final 1) + P(Final 2) = 1 (no!).
It is not true that one of the states we're interested in will have to occur. On the other hand, when we're asking the same probabilistic question about a class of initial states, one of them has to occur:
P(Initial 1) + P(Initial 2) = 1 (yes!).
The total probability that an initial macroscopic state evolves into a final macroscopic state is given by the probabilities that particular microscopic representatives of the initial state evolve as they should; but we must also multiply them by the probability that the initial state occurred in the first place. And the latter probability is not one. The most typical "prior" probability corresponding to a maximal ignorance about the initial state puts
P(Initial 1) = P(Initial 2) = 1/2
So here we have a new way to write the formula for "adding over final" and "averaging over initial" microstates:
P = ∑ i,f P(Initial i) |M fi|2
Here, "P(Initial_i)" is the prior probability that the particular initial states was realized in the first place and it is equal to "1/N_{initial}" in the case of complete ignorance among "N_{initial}" indistinguishable states. The squared amplitude is the conditional probability that a given final state occurs for a given initial state.

Why is there no factor of the "prior probability" for the final states? Because a "prior" for "final" states is an oxymoron. Prior means "earlier" or "first" in Latin. Priors can't be final. This is not just about linguistics. It is about the laws of logic. What are the formulae? See e.g. this definition of conditional probability:
P(A and B) = P(B) P(A given B).
Replace the symbols "A, B" by "Final, Initial":
P(Finalf and Initiali) = P(Initiali) P(Finalf given Initiali)
This is just a basic identity of mathematical logic. If you want to calculate the probability involving an initial state and a final state, you must know the probabilities that the evolution occurs (or that the implication is valid) but you must also know the prior probabilities that occur as prefactors.

If you decided to switch the role of the "initial" and "final" adjectives in the latest formula, you would still get a formula that is correct at the abstract logical level but you wouldn't obtain a new usable recipe to calculate probabilities of evolution. Why? Simply because the "prior" probabilities of the final states, "P(Final)", would appear in your formulae. They are not only linguistically inconsistent but also unknown. ;-) It's just how the world works. The future evolves from the past and the only way how the future can be determined (predicted, at least partially) is to know the laws of evolution and to know (something about) the past.

Whoever is using "prior probabilities" of future events to say something about the future - for example, whoever says that the future must be dim because of the illogical premises of environmentalism - is a bigot. The only thing that can be known, at least partially, is the past, and it is therefore the past assumptions only that can occur as arguments of "prior probabilities". The future is free and must be free - it is whatever the present will lead to - and it is impossible to assume probabilities about the future by "priors". This statement of mine is not ideological in any sense, it is an inevitable result of logic combined with causality, and this is where the time-reversal asymmetry of all processes involving incomplete information resides.

Science vs time-reversed science

To make it very clear why science needs to assume prior probabilities of the past (or events at time "t") but not prior probabilities of the future (or events at later times after "t" that are being linked to "t"), here is a couple of examples how explanations look in science and how they look in Sean Carroll's time-reversed science. In conventional science,
• the existence of simple organic compounds or simple organisms billions of years ago can be shown to lead to more complex compounds of organisms, because of the natural processes including natural selection, but the exact structure of the complex organisms is hard to predict accurately
• the current configuration of the Earth's climate is used as an assumption (different configurations are given different prior probabilities) and together with the - very inaccurately known - dynamical laws, we may try to calculate the probabilities whether all the humans will be fried by 2100 ;-)
• using the laws of genetics, simulations of natural selection in the past, and the observed social patterns of humans (and other mammals), we may try to calculate (or estimate) the probability that a woman wins the Fields medal; the result will be roughly 2 orders of magnitude below the same probability for a man
• the observations and theories constructed out of them (including the explanations in this article) seem to imply that the entropy universally increases with time; this insight may also be applied - by extrapolation or generalization - to extreme parts of cosmology, including the early Universe, to argue that the entropy of the early Universe was much smaller than today and maybe very small
And here are the corresponding examples of Sean Carroll's time-reversed science (T-science):
• the existence of humans today (or in the future) is an assumption because it is a purpose of the Universe; we use this knowledge, a prior probability, to deduce statements about the past; for example, one of the consequences is that there had to be God who created the humans by hand: the entropy increases into the past so the humans had to evolve backwards into something even better, and God is the only option
• the human activity apparently ruins the planet; it is thus T-reasonable to expect that we will be burned by 2100: this statement has a high prior T-probability; using the T-scientific methods, we can T-derive all kinds of T-theorems, for example that all climate skeptics are stooges of the oil industry
• we may T-reasonably assume that the T-natural outcome of the life in the society is that all people are equal; for example, there should be as many female Fields medal winners as the male ones; because this T-fact doesn't seem to be satisfied, we may T-scientifically derive that the Fields medal committee (plus all other committees and individuals who influence anything about women in maths) are made out of sexist pigs who should be arrested; we also know that the working class will be ruling in the future where everyone will be equal and where the private ownership will be abolished; this "future prior" can be T-scientifically used to execute everyone who prevents Nature from evolving into Her future, as proved by the T-scientific ideology of Marxism
• it is T-reasonable to assume that generic states are always preferred, without adding any detailed disclaimer; it T-implies that the entropy had to be higher in the past than what it is today; we may ignore all data from the past - including all macroscopic phenomena we have ever observed - that contradict this proposition because we have already chosen priors about the past and about the character of physical laws that are philosophically pleasing ;-)
I could give you many more examples but the message should be clear. When we're doing science, it is only the data about the present and the past that are available. The data about the future are not available right now, by the very definition of future. So the assumptions about the future can never be directly justified by observational evidence; they cannot enter our formulae as "independent variables". It is always the observations from the past that can be used by science to construct theories and to use them to both reconstruct the past and predict the future.

In this process, the evaluation of evidence from the past is used to refine our knowledge about all the priors - about the other, directly unobserved, features of the initial state as well as the dynamical laws of Nature. These insights can be subsequently used both to retrodict other things and to predict the future. But if you read this paragraph carefully and think about it, one thing is clear. At the end, all calculated features (and probabilities) of the past world as well as the future world are functions of quantities in the past (or their probabilities) as long as you are doing science.

The only way how one can "seemingly" revert the role of the past and the future is to switch the definitions of these two words (and "imagine" that the processes run backwards). What you end up with is a mathematically isomorphic logical framework. But such a change of notation is a trivial linguistic (or "psychological") exercise that obviously cannot have any physical consequences. It cannot teach us anything. It cannot predict anything. It is a redundant choice of conventions, much like if we decided to switch the meaning of the symbols "+" and "-" or to write the words in papers backwards. There is only one inequivalent meaningful logical framework, one that is sketched above, and it cannot "co-exist" with its mirror images in any way.

If you use "time-reversed" formulae where things are calculated ("predicted") from the future that you assumed to look in one way or another (or to have certain probabilities of some outcomes), without calculating them from the information residing in their past, then you are a fragmented pot. And if you're an editor of a journal who prints an article about this topic written by a fragmented pot, you are a fragmented pot, too.

And that's the memo.