Kip Thorne hasn't conceded yet; I think that his position has become indefensible over the years.Off-topic:Israel Gelfand would celebrate his 100th birthday today.

In July 2005, Hawking wrote a paper (TRF comments) in which he presented his own arguments why the qualitative outcome was different than he used to think. Even though Hawking would admit that the developments in string theory and especially AdS/CFT were the main advances that made him change his mind, his 2005 ideas were presented as great new insights by Hawking and some of the journalists. Your humble correspondent and most experts in the field were skeptical. I wasn't hiding my skepticism either but it seems clear that I was more sympathetic to those ideas than most others.

The recent discussions about block non-diagonalizability of the black hole evolution operator in a classically accessible basis as well as the impossibility to identify local operators in a background-independent way have strengthened my feeling that Hawking was ahead of time when he pointed out an important feature of the evolution:

For a proper understanding of the black hole information puzzle, it's important to properly (quantum mechanically) treat superpositions of classically distinct black hole microstates; and, which is related, to include the interference between the histories with different intermediate states (black hole is not there; black hole in one location/shape/decoration is present).This important observation wasn't emphasized just by your humble correspondent. I would say that Papadodimas and Raju; Nomura, Varela (and sometimes Weinberg); and Hsu consider the revelation above to be an important part of their knowledge that makes it clear that the arguments that black hole firewalls have to exist are flawed.

A key paragraph from Hawking's 2004 concession speech said:

Information is lost in topologically non-trivial metrics like black holes. This corresponds to dissipation in which one loses sight of the exact state. On the other hand, information about the exact state is preserved in topologically trivial metrics. The confusion and paradox arose because people thought classically in terms of a single topology for spacetime. It was either \(\RR^4\) or a black hole. But the Feynman sum over histories allows it to be both at once. One can not tell which topology contributed to the observation, any more than one can tell which slit the electron went through in the two slits experiment. All that observation at infinity can determine is that there is a unitary mapping from initial states to final and that information is not lost.For Hawking, the interference between the black-hole-containing and black-hole-free intermediate states is what restores the purity of the final state. Most of us had the same feeling as the feelings recently shared by Scott Aaronson:

I should confess that I don’t understand this argument (and apparently I’m not alone — even Preskill, to whom Hawking conceded, said he didn’t understand it!). But Hawking does seem to be clearly asserting that the solution to information loss involves there being a nonzero amplitude for the black hole never forming in the first place. (Though an obvious issue is that he doesn’t say how large the amplitude is: if it were nonzero but exponentially small, that wouldn’t seem to help much.)This complaint against Hawking's "key role" of the black-hole-free intermediate state sounds very natural. After all, it seems intuitively "obvious" that the contribution is either tiny in which case it can't have the potency to convert the near-maximally mixed thermal final state into a pure one; or the black-hole-free intermediate states are dominant but then we don't have any explanation why the evolution looks like any events in a black-hole-containing spacetime at all.

*Off-topic: A dancing 3D model of a Calabi-Yau manifold. A girl with a 3D printer may print them for you. Via tweeting Maria Spiropulu.*

I have repeatedly written the same objection against Hawking's thoughts in the past although my formulations were never as clear as they are today. However, with some newer realizations, I believe that the complaint is at least morally wrong. The basic weapon that challenges the intuitive explanation from the previous paragraph was articulated in the following blog entry and the paper mentioned therein:

Hawking radiation: pure and thermal mixed states are a micron awayThe text argues that in a "truly generic" basis of the \(\exp(S)\)-dimensional Hilbert space relevant for the CV of a black hole, the pure and (maximally or near maximally) mixed density matrices may only differ by exponentially tiny matrix elements of order \(\O(\exp(-S))\).

There is something that I find a bit demagogic about this December 2012 text of mine today: the mixed density matrix has (diagonal) entries of order \(\O(\exp(-S))\), too. So while it was "small" in an absolute sense, the correction needed to perturb the approximate mixed final state to a pure state has to possess matrix elements that are of order \(\O(100\%)\). There was no wrong claim in my blog entry but I was sort of hiding this fact.

But this update doesn't really invalidate the point of the "micron" essay qualitatively. The point is that the off-diagonal entries that are comparable to the diagonal ones may still be invisible to the semiclassical calculations – in fact, they may be invisible at all finite orders of perturbation theory.

**Path integral for mechanics**

Let me begin with a physical system that has been understood for quite some time: non-relativistic quantum mechanics. In Feynman's path integral approach, the evolution amplitudes are computed as the functional integral\[

{\mathcal A}_{f\leftarrow i} = \int {\mathcal D}x(t)\,\exp(iS/\hbar)

\] over all trajectories that begin and end at the right places. Note that all trajectories, however weird ones, contribute equally (as far as the absolute value goes). If you made an error and treated the observable \(x(t)\) classically, you would expect that the integrand is only nonzero for the correct classical trajectory but it vanishes everywhere else.

Quantum mechanics says something different. The classical trajectory is "highlighted" in the classical limit because all the trajectories that sufficiently differ from the classical solution tend to have a "random", quickly variable phase as the integrand. These random phases tend to cancel and only the phases \(\exp(iS/\hbar)\) near the extremum of \(S\), i.e. near the classical solution, contribute "coherently" because the phase (the exponent) isn't changing much near the extremum (or extrema).

**Back to black hole density matrices**

Consider a black hole formed by a collapsed of a star in a pure state \(\ket\psi\). The black hole gets formed and then it evaporates. Hawking's approximate 1974 calculation of the final state reveals that the final state is a thermal one (with increasing Hawking temperature as the black hole shrinks), one given by a near maximally mixed density matrix. This result is likely to hold to all orders in perturbation theory.

We know from the AdS/CFT, Matrix theory, and other explicit constructions that the final state is actually pure. So if you describe it by a density matrix, it must be a density matrix of the form\[

\rho_{\rm final} = \ket{\psi}_{\rm final} \bra{\psi}_{\rm final}

\] which must still be rather close to the approximate density matrix \(\rho_{\rm approx}\) that we claimed to be near maximally mixed. They look "qualitatively different" but this type of "qualitative difference" is one that may actually result from tiny or (in practice) hardly observable "quantitative differences".

Let's sensibly assume that in a "classically natural" basis for the Hawking radiation, the final pure state is "generic". It means that in the relevant \(\exp(S)\)-dimensional Hilbert space, all the amplitudes are of the same order i.e. of order \(\O(\exp(-S/2))\) – which is needed for the normalization condition \[

\sum_{i=1}^{\exp(S)} |c_i|^2 = 1

\] to hold. The relative phases between \(c_i,c_j\) are important although the laymen are often led to believe (by sloppy presentations of the Schrödinger cat thought experiment and other things) that only the absolute values matter. The pure density matrix has matrix elements\[

\rho^{\rm final}_{ij} = c_i c^*_j

\] What would you think about the value of the density matrix if you committed a similar error we discussed in the "non-relativistic quantum mechanics" section above? Well, you would do exactly what the laymen usually do when they think that only the absolute values of the amplitudes in a basis matter: you would just keep the diagonal entries but incorrectly set the off-diagonal entries to zero:\[

\rho^{\rm final}_{ij} = c_i c^*_j \cdot \delta_{ij}

\] Apologies, the Kronecker delta must be interpreted "literally" and the usual checks for indices (repeated indices only occur if they're summed via the Einstein sum rule) don't hold here.

Now, my point is that it is perfectly compatible with everything we know – and, ultimately, inevitable – that the semiclassical approximate calculation ends up with a similarly castrated final density matrix as the density matrix with an extra Kronecker delta factor. Why?

Think about two mutually orthogonal microstates of the black hole (or the black hole radiation that results from them), \(\ket i\) and \(\ket j\), which are very similar to one another in some operational classical way of looking at things. For example, they are two black hole microstates that describe the black hole located at positions that differ by a sub-Planckian distance (which still allows the states \(\ket i\) and \(\ket j\) to be orthogonal if the black hole mass is much greater than the Planck mass, and it should be for the black hole interpretation to be OK); or \(\ket j\) is obtained by a creation of a soft photon or another quantum on top of the structure given by \(\ket i\).

What I want to emphasize is that the difference between \(\ket i\) and \(\ket j\) will be inevitably invisible in a semiclassical approximation to any calculation of the evolution of the black hole. If you think about the spinning Earth, you have no chance to distinguish the states of the Earth with the \(z\)-component of the spin equal to \(J_z\) and \(J_z+\hbar\) because \(J_z\gg \hbar\). So all such things are invisible in a calculation that treats \(J_z\) "classically".

In some cases, you may argue that the quantum evolution operator must be diagonal in such "classically indistinguishable microstates", anyway. This diagonal form may follow from the conservation laws (of the angular momentum, for example). The point is that this is

*not*true for the differences between black hole microstates \(\ket i\) and \(\ket j\) described two paragraphs above.

For example, when a black hole is emitting the Hawking quanta, there is no reason for its center-of-mass location to be exactly conserved. In fact, we know for sure that it is not conserved. The black hole is recoiled once it shoots a Hawking particle in a specific direction. Referring to the black hole's large mass, such recoils have been largely neglected in all the (semiclassical – and sometimes "more ambitious") calculations of the Hawking radiation. But the black hole is actually moving because of these recoils and the motion resembles the Brownian motion at (very long) timescales comparable to the Hawking evaporation lifetime. It can get very far.

While the changes of some internal properties of the black hole such as the precise sub-Planckian location of the center-of-mass or the \(\O(1)\) changes to the number of soft quanta around it (which may be large) may be neglected for some purposes, they surely cannot be neglected if you want to calculate the final matrix element \(\rho_{ij}\) where \(\ket i\) and \(\ket j\) are two classically "nearby" microstates.

The actual behavior of \(\rho_{ij}\) for a pure initial state should be clear to you: in a "classically natural" basis for the radiation, all matrix elements \(\rho_{ij}\) are of the same order, whether they are diagonal ones or off-diagonal ones. This is clearly implied by the purity and genericity of the microstate. All approximate calculations tend to assume that the final density matrix – and/or the evolution operator – is diagonal or off-diagonal in some basis of microstates that look natural or easily accessible for classical measurements in the final spacetime (e.g. Fock space occupation number eigenstates of the radiation). But this assumption is completely wrong and the full, exact calculation shows that the off-diagonal elements are actually of the same order. One may only rightfully conclude that the off-diagonal elements (of the final density matrix or the evolution operator) are "almost zero" if we average them over many classically similar yet mutually orthogonal microstates but if we really treat them accurately, the off-diagonal elements in a basis of our choice are never negligible relatively to the diagonal ones. In fact, I would stress that the off-diagonal elements between "pretty much any two" classically natural states are comparable to the geometric average of the two diagonal states.

I believe that the mistake described in the previous sentences and many paragraphs above them is one of the most widespread and crucial mistakes made by Joe Polchinski and many others who end up with incorrect and seemingly paradoxical conclusions such as the existence of a "black hole firewall". They just treat the black hole's own properties – including the metric tensor around it – classically and they believe that the unitarity should hold in each "superselection" sector (with some classical properties; effectively, in each "exact" background spacetime) separately.

But this can't be the case. To guarantee unitarity, it is essential for quantum gravity to have interference – and nonzero off-diagonal matrix elements – between microstates of a black hole that look "similar in the classical approximation" but whose details differ (location of the black hole center mass measured with a sub-Planckian accuracy and/or infinitely many occupation numbers changing by much smaller additions than their rough classical value, to mention two major examples). Only with this full connectedness of the black hole microstates – nonzero off-diagonal entries through which you can connect (assuming many \(ij,jk,kl,lm\) jumps) a black hole state with any other state (including a black-hole-free state) – quantum gravity is capable of preserving all the principles simultaneously (unitarity, equivalence principle wherever it should hold, and locality in the appropriate approximation).

Oh, man! I risk losing my decent job as a decent physicist if I continue reading all these interesting posts here! :D...

ReplyDeletemy comments to come soon! Anyhow, looks very interesting!

I was thinking how to edit the text above to emphasize one quantitative fact that wasn't mentioned but it may break the smooth stream of ideas that I see over there, so let me add the comment as a DISQUS appendix here:

ReplyDeleteWhen we consider a diagonal entry rho_{ii} of the final density matrix, it was emphasized above that the off-diagonal elements rho_{ij} are of the same order and can't be neglected.

But you might ask: How many additional states "j" connected with the original state "i" have to be accounted for to achieve the purity of the final state?

One could think that only some number of "nearby states j" may have comparably large off-diagonal matrix elements with the state "i" and it's enough to get the unitarity. That would mean that you may assume that the background geometry may be approximately fixed and you never need to deviate from it too much to at least approximately restore the purity.

But this is wrong. It's essential that pretty much all states "j" - exp(S) of them - has a non-diagonal metrix entries with "i" (a Fock space basis of the final radiation). Why?

A pure density matrix is (1,0,0,0,0,0) in some basis so Tr(rho^2) is equal to one. On the other hand, a maximally mixed density matrix has rho = unitmatrix * exp(-S) and Tr(rho^2) is just exp(-S) or so. Go backwards. To increase the Tr(rho^2) from exp(-S) to 1 while going from the approximate thermal answer to the pure/unitary exact one, we have to make all matrix elements of rho in a generic basis nonzero and comparable, otherwise we just don't get one (we assume that all the dominant/nonzero matrix elements are of the same order).

So the situation is just like in the path-integral analogy where exp(iS/hbar) is equally large in magnitude for *any* trajectory. Pretty much any evolution - any matrix elements etc. - is equally strong, of the same order. This doesn't contradict low-energy effective field theory which always averages over a huge number of microstates.

The mistake of assuming a fixed black-hole background, even if you admit that it may "slightly fluctuate away from it", will still result to a huge error in Tr(rho^2) for the final density matrix rho that will still be much closer to the maximally mixed density matrix than to a pure matrix. To correctly get Tr(rho^2) = 1 for the final state, one really has to acknowledge that the quantum amplitude for the evolution among *any* two states is really of the same order, independently of the matrix element's being diagonal or off-diagonal, block-diagonal or off-block-diagonal in any sense!

so, this is "And" :) ... I still have only very little time but I would like to take more time to read through Hawking's statements and mainly the issues he was "ahead of time" as they are very much in my spirit. I do agree with LM memo but I have to understand if in the same sense or in a somehow extended one. Up to now, the post in this blog seems accurate and nice motivated. Just wonder if supersymmetry plays a role...

ReplyDeleteIn his autobiography Susskind describes a scene in which he made the same bet with Hawking in the presence of 't Hooft while all three were visiting in EST guru Werner Erhart's apartment in the Haight Ashbury district of San Francisco. I don't remember the time frame exactly -- it must have been in the late 1960's -- and can't fathom why they were hanging out with this celebrity con man in the first place (they all dropped acid together?).

ReplyDeleteDear Lulke, I know the scene - well, the rich guy just liked such company, I know similar guys today.

ReplyDeleteLet me assure you that the scene couldn't take place before 1974 because there was no information problem before that.

Dear Lubos,

ReplyDeleteI find this very interesting. But what about this thought: The bigger the black hole the smaller the Hawking quanta and the more perfectly the momentum of the different quanta should average out? On the other hand a micro black hole should jump around wildly.

Dear Mikael, a smaller black hole has "larger" - more precisely heavier - quanta than a larger black hole!

ReplyDeleteThe wavelength of the quanta is comparable to the black hole radius. A smaller black hole means a smaller wavelength i.e. higher frequency i.e. higher energy/momentum i.e. higher temperature.

That's why a smaller black hole jumps aroud more violently - and also evaporates more quickly.

An administrative remark: When I load this article a download of the Hawking fuzzball talk automatically starts. Don't think that this is intended.

ReplyDeleteDear Mikael, I explicitly disabled this for Chrome with VLC, and similar combinations, in the morning by "autostart='false'" parameter of the tag. If it doesn't work for some other browsers/plugins, too bad, you have to click stop manually. If you tell me how to fix it in a reasonable time, I will do it. But I won't spend an hour by researching how to set autostart=false for all browsers and plugins in the world, sorry.

ReplyDeleteRight. That is my point. This means for a big black hole the position becomes more and more sharply defined so interference between different black hole positions should play a lesser and lesser role.

ReplyDeleteDear Lubos,

ReplyDeleteI don't have any issue with it and I don't know how to fix it either.. Just wanted to make sure you know it. Interestingly I use the newest Chrome as well and no unusual plugins at least that I am aware of.

Nice appendix :-)

ReplyDeleteThis again makes me thinking a lot about certain analogies, but I am not sure enough about it to mention it here (should reconsider certain things in a paper I am reading first, darn!) ... :-P

Hmmm. So then it probably wasn't the Haight (which is where I was in the late 1960's -- meh!) Pacific Heights more likely. Still, Werner Erhard? A professional ass-hole.

ReplyDeletemaybe so, but the problem appears with the extreme cases that appear to violate some principles of physics. Does this indeed happen or is it just the result of some (semiclassical) approximations?

ReplyDeleteScientists can be as deluded or superstitious as anyone, maybe even moreso.

ReplyDeleteDear Mikael, right, a larger black hole's position is more robust. But a larger black hole also takes a much longer time to evaporate - the lifetime goes like M^3 in d=4 - so during the long lifetime, a larger black hole actually jumps by a greater distance than a smaller one.

ReplyDeleteThe information retrieval doesn't occur immediately - the whole information is only retrieved when the whole black hole evaporates. So you can't just be justified with what the hole is doing for one second - which is indeed "not much" for a big black hole. You must watch its evolution throughout the lifetime and then it doesn't help that it is a large hole, quite on the contrary.

Right, unitarity (preservation of information) seems violated in the semiclassical approximation but string-theory-based evidence is overwhelming that the exact result going beyond the approximations does agree with unitarity.

ReplyDeleteok, if my memory is still ok, and I see no clear reason for it not being so, a supersymmetric theory should obey some localization principles that would lead to the paths to be localized around some points. Would some sort of space-time supersymmetry simplify the problem in this sense? Also, if one considers a sum over topologies would that not imply automatically the "no black-hole" case as the genus 0 case (sphere, or something like that...) ? In all cases, my feeling is that some non-equivalent configurations may in the end look like being the same if analyzed in a topological sense (say, for example, related by some homotopy?). The whole problem of exp(S) may in fact be smaller if one considers the full problem and this, if I see it right, would amount to making exp(-S) larger... am I somewhere wrong?

ReplyDelete