Friday, May 17, 2013

String theory = Bayesian inference?

The following paper by Jonathan Heckman of Harvard is either wrong, or trivial, or revolutionary:
Statistical Inference and String Theory
I don't understand it so far but Jonathan claims that one may derive the equations of general relativity – and, in fact, the equations of string theory – from something as general as Bayesian inference by a collective of agents.

It sounds really bizarre because the Bayesian inference seems to be a totally generic framework that may be applied anywhere and that says nothing else about "what the theories should look like" while general relativity and string theory are completely rigid, specific, well-defined theories. How could they be equivalent?

Jonathan considers a collective of agents who are ordered along a \(d\)-dimensional grid. Each of them tries to reconstruct the probabilistic distribution for events that they observe experimentally. Collectively, these distributions define an embedding of a manifold in another manifold and Jonathan rather quickly states that various conditional probabilities we know from the Bayesian inference may be written as the Feynman path integrals with the actions that include \(\sqrt{\det G}\), \(\sqrt{\det h}\), and similar things!

Again, I don't understand it so far but needless to say, a proof that string theory is the same thing as rational thinking – and not just a subset of rational thinking – would be extraordinarily important. ;-) I will keep on reading it.


  1. japfrugone@yahoo.comMay 17, 2013, 10:36:00 AM

    read the preprint, in my oppinion it is at the same time revolutionary and trivial. How can it be both ? It is trivial because it is (almost) circular reasoning (with an fantastic twist): He goes from a modified version of the 'equivalence principle' to Einstein equations that is to the equivalence principle. The modified version of the equivalence principle he introduces is the Bayesian model where a concensus statistical model exists. It is not that the results of experiments for all agents must agree covariantly but instead that the statistical models that they can elaborate to understand their esperiment must be possible to unify in a consensu model. So, that is the novelty here. An statistical equivalence principle (I would call it a consensus principle). The magic thing is what happens in between this circular travel (and yes the connection with string theory is surprising , but not gravity emergin from it). What do you think ?

  2. Thanks for your intriguing comment but I understand it even less than the preprint. ;-)

  3. Have I got this right: You liberally sprinkle this collection of "agents" around, each one using local(?) observations to construct a statistical model (just fitting some distribution parameters ) then in the limit that this collection of agents becomes continuous, the posterior prob. distribution is given by a path integral of a sigma model action. This suggests that, at least in information geometry terms, the sigma model is somehow fundamental - these agents could be making observations and buidling models of *any* old crap?

  4. I understand it in the same way now...

    Conformal sigma-models are an interesting subclass because they're mapped to "stable inference schemes".

  5. I want to highlight this paragraph:

    "In other words, stable statistical inference selects out two dimensions for the grid of agents. The limiting case where the overall dependence on the number of agents drops out translates to the condition of conformal invariance in the sigma model. As is well known the condition of conformal invariance leads directly to the Einstein field equations for classical gravity. Quantum fluctuations around the background metric arise from fluctuations in the inferred probability distribution. As far as we are aware, this is the first derivation of classical gravity from the condition of stable statistical inference."

    My first impression is that this is something that Gauss must of understood intuitively when he developed his least squares approach. The point is that regardless of overall dimensionality, there is always a one dimensional distance that can be defined between any actual value and its inferred value. The Pythagorean theorem is only well defined with root 2, (a la Fermat's Last Theorem). Relativity is based on the extension of Pythagorean theorem to n dimensions (e.g. the collection of two dimensional manifolds).

    The following is from the article about least squares:

    "Least squares corresponds to the maximum likelihood criterion if the experimental errors have a normal distribution and can also be derived as a method of moments estimator."

    The convolution of error distributions must eventually approach the Gaussian distribution per the Central Limit Theorem, so least squares is a very natural approach to inference for systems with large numbers of observers.

    I think the paper is worth understanding since while at some level it is apparently intuitive, it does begin to link several different thoughts and concepts (assuming correctness). It does speak to the naturalness of string theory as a limit of predictive modeling, which it should be since linear structures should be possible as limits to inference whenever dimensions are greater than zero.

    Is the paper revolutionary? I think unfortunately that despite its keen insight, it might fall victim to the times we are in. We know that some of the things that would have been viewed as revolutionary even just 30 years ago are no longer seen as such, what is more important is assuming correctness, it opens a direct path between physics and statistics, where in the past statistics have been viewed as tool for studying physical relationships and not so much as a integral part of nature. It also reinforces the idea that our mental image of the natural world is the result of sensory based computation (using the term loosely) and our common mental images are natural results of our common genetics leading to common processing of information...e.g. we should expect to view the world the same since we will naturally agree upon the same interpretation of the data sourced from whatever manifold we live in and the general means by which we collect and process data is generally the same since we all share common ancestry.

    Excepting any potential flaws in the derivation, a good paper.

  6. Very interesting thoughts by Jonathan. My difficulty probably boils down to the fact that I have no deep feelings for things like "stable statistical inference" - isn't the whole thing just a way to visualize points on a world sheets as "people" and giving the quantities in the sigma model fancy anthropomorphic names?

    Moreover, if one derives that the world sheet with d=2 is fundamental, it's sort of strange because since the mid 1990s, I/we have viewed the world sheet and their d=2 as an artifact of the weakly coupled limit in string theory.

  7. Lubos,

    I would think that to the extent string theory is a theory of nonlocal hidden variables, Heckman is correct. It's interesting to me that you would imply that string theory without a general proof may be irrational -- after all, the mathematical proofs that support string theory are what make it a coherent (therefore rational) scientific theory and not just a set of speculative propositions; every aspect of the theory corresponds to (albeit retrodictively) real physical phenomena.

    However it is applied, though, Bayesian inference is a slippery slope. It requires a degree of personal belief that some definite probability on the interval [0,1] exists independent of Bernoulli trials. Hence, it is open to the same criticism most often leveled at string theory itself -- lack of specific novel predictions supported by repeated results of experimental tests.

  8. Having read more, here are some additional thoughts. As far as the anthropomorphism, we could do without it, but the idea the d=2 is fundamental in various contexts I don't think is an odd idea. I think the point is that under perturbations in 2-d one is able to adjust there initial inference where as in higher dimensions one is unable to do so. This I think is closely related to the existence to general analytic solutions in 2-d vs no general analytic solution in 3-d (nee 3-body problem). 2-d also comes into play in boosted spacetimes, so it isn't hard to imagine ordinary spacetime as always having agents in a 2-d grid. I don't know if we say that 2-d is universally fundamental, since the paper claims that the so defined 2-d agent space can map to a parameter space which would seem to be more fundamental.

    The paper states:

    " for each point of Σagent, we get an agent with a corresponding statistical model"

    Which seems to suggest that for any point in a boosted spacetime, there exists a M-dimensional statistical model, which can be updated in a smooth way such that all other points agree with the update. This in my mind begins to speak to holography.

    From a more "human systems" perspective, I can not help but think of how when we look at the world, we still can build a two dimensional picture in our mind. Any change we encounter can be smoothly incorporated in that image of the world we see. Philosophically its a nice analogy.

    As far as some of the statistical comments, when comparing models of several different types (linear, non linear) one makes the comparison by converting the data and predictions to a standard unitized space and comparing the Residual Sums of Squares. The model with the smallest RSS is the better model. I haven't read the previous papers, so I am not fully aware of the specific calculation of posterior probability, but it would seem to be related to the closeness of measured values to actual values (and actual values are only well defined in classical mechanics). The possibility of different agents having different measured values is then a real consideration, so the ability to update inferences seems to be important.

    As far as the 2-d being a weakly coupled limit to string theory, I think the map to parameter space is key. The parameter space seems to be more "real" then the agent space, the fact that there is a map between the spaces seems important since one could argue that there is some map from some set of value in parameter space that agents in a 2-d grid will always agree to. The importance of the agents then is that they are the ones taking measurements and building models.

    These are just some initial thoughts, hopefully they are congruent.

  9. I agree that if string theory could be derived purely from information-theoretic considerations, that would be extremely interesting. :-) But I also didn't understand this paper. Here's a naive question for starters: forget about string theory, or sigma models, or gravity. Starting from the picture of a bunch of Bayesian agents arranged on a lattice making nearby guesses, how does one even reach the conclusion that the world should be quantum-mechanical rather than classical?

  10. Dear Scott, good question. I don't follow how it exactly works but Jonathan surely seems to claim that he can derive quantum mechanics out of nothing - just from some collective inference - along with the non-linear sigma model.

    He claims that the probability Z(A_coll | E_coll) is given by the path integral - and the latter is quantum mechanical, of course. So he must believed that QM is imposed upon him once he considers the grid of agents. I can't reproduce the proof (so far).

  11. Jonathan Heckman's father is James Heckman, U. Chi. Nobel Prize in Economics 2000.

  12. I didn't know! Despite the fact that Jonathan was my student. ;-)

  13. That shows some humility. He is probably a hell of a scientist.

  14. Well, I guess it's probably not that convenient to be known as a son of a Nobel prize winner. I may remain silent about those things, too.

  15. But isn’t this just another form of the usual relation
    between Euclidean QFT and statistical mechanics? The partition function took its name due to that…

    Anyway if it's not something trivial like the above then
    it’s mind boggling.

  16. Hi. At the risk of embarrassing myself (both articles are on my reading table) is this related to the latest bit to be highlighted in SciAm (I know not one of your fave publications)?:

    I subscribe to print edition and to get past the pay-wall they want another few shekels, sorry no.

    Is Beyes is the latest craze?

    Thank you.

  17. Seems useless, why postulating a nonlinear sigma model? It is clear that hence it is renormalizable just in 2d this specific dimension will emerge. On the other hand it brings nothing new about real compactfications ...

  18. this reminds me of a book called Physics from Fisher Information, a few years back. The attempt to derive physics from epistemological principles is not something new, although some do it better than others.

  19. you wrote: " there exists a M-dimensional statistical model, which can be updated in a smooth way such that all other points agree with the update. This in my mind begins to speak to holography."

    Well, that in my mind speaks plain "Equivalence Principle" compare it with this: For any observer in curved space there exist a diffeomorfirsm such that the transformed coordinated system leaves us with a diagonal metric (flat space).

  20. A recent ArXiv paper also involves Bayesian probability theory. Although the grandiloquent comment on the abstract page looks ominous from a kook detection standpoint, and the same might be said of the title, the paper is in an endorsed group and at first glance looks cogent and very interesting.