## Friday, January 18, 2008

### Bayesian inference

Two years ago, we discussed the Bayesian reasoning in a critical light:

Bayesian probability I, II
I have emphasized that the Bayesian probabilities are subjective in character. They depend on the precise evidence that one uses in his reasoning. It is meaningless to calculate Bayesian probabilities too accurately or claim that science has calculated one of them to be 90%. For example, if a report says that the probability that most of the 20th century warming was caused by man-made CO2 emissions was determined by science and equals 90%, it proves that the authors are just parrots who don't know what such probabilities mean. Why?

If someone's probability that the statement is correct really equals 90%, it means that the person thinks that a better scientist who could actually choose and analyze better and more extensive evidence more carefully would end up saying that the probability of that statement is 100% (with probability 90%) or the probability of that statement is 0% (with probability 10%). The precise figure of 90% is just a subjective result and a temporary state of affairs. The only reason why it is not equal to 0% or 100% is that the question is not settled. Some people can't distinguish subjective psychological conclusions from objective science.

Goal of this text

But it turned out that there exists another problem. Other people, and sometimes it is the same ones, also don't know how to look for their own subjective opinions and probabilities rationally. Bayesian inference is a good method to separate assumptions from results and to provide us with a solid methodology to use evidence and arrive at reasonable conclusions about the likelihood of various statements.

So even though nothing changes about my criticism of subjective probabilities, I will dedicate a special positive article to Bayesian inference and shed some light on its relevance for the naturalness problem, anthropic misinterpretations of the landscape, retrodictions, and thermodynamics.

Retrodicting the past
Myths about the arrow of time
Articles on this blog criticizing the anthropic reasoning are far too numerous to be listed. Try the landscape category.

Bayes' formula: its meaning

Rev. Thomas Bayes (1702-1761) was more than a Presbyterian minister. He was also a mathematician who gave us a useful formula how our psychological probabilities should be refined if we obtain some new evidence. I recommend you this Wikipedia article for a pretty clear explanation. Nevertheless, I give you mine, too.

In this formula, we investigate different, complementary (and mutually incompatible) hypotheses H_i to explain some phenomena. Before we obtained the new evidence, we had some idea about their likelihood (from previous evidence, from the testimonies of our favorite and wise friends, or from some laws of physics such as the Hartle-Hawking state, if you wish). These subjective probabilities P(H_i) are called the priors. If you like to think in terms of physics, the initial conditions of a physical system are the best example of such hypotheses: each initial state is a hypothesis H_i. Bayes' formula is then a method of retrodiction.

If we don't know anything about the validity of the theories at all, the priors should give a chance to every qualitatively or macroscopically different hypothesis to survive (e.g. 1/N). You shouldn't kill a theory by choosing a ridiculously small prior just because it has a low entropy or a small number of extraterrestrial aliens etc.

Suddenly, we observe some evidence E to occur. In the case of retrodiction, E is a particular feature of the final state that we observe - for example the whole macroscopic or approximate description of such a final state. We use this final state to deduce the initial state. In exact, microscopic physics, there would be a one-to-one correspondence between initial and final states. But if we only know some partial (e.g. macroscopic, in the thermodynamic sense, inaccurate, or otherwise incomplete) information about the final state, it is impossible to uniquely deduce the initial state, not even its basic macroscopic properties.

Predictions become irreversible and retrodictions follow different rules, ideally the rules of Bayesian inference. Because this kind of reasoning inherently depends on the priors, there will always be an uncertainty in any kind of retrodiction.

Because rationally thinking people want to avoid the base rate fallacy and they care about the evidence, their guess about the probability of different hypotheses (or initial conditions) is influenced by the evidence. The probability of a hypothesis after we obtained evidence E, namely the so-called posterior probability P(H_i|E), written down as the conditional probability of H_i given the evidence E, will be different from the prior P(H_i). But how much it will differ?

Bayes' formula: a derivation

First, assume that every hypothesis H_i either claims that the evidence should occur, i.e. the conditional probability P(E|H_i)=1, or it shouldn't occur, P(E|H_i)=0. How does the evidence that E has actually occurred influence the probabilities?

Well, it's easy in this case. We simply eliminate the hypotheses that have been falsified, i.e. those H_i with P(E|H_i)=0 that predict that the evidence E shouldn't occur. Note that all your knowledge of dynamics and Feynman diagrams is only used to calculate the conditional probabilities of evidence E in a hypothesis, P(E|H_i). That's where all the dynamics is hidden.

But when E is observed, we shouldn't change the ratios of probabilities of the hypotheses that have survived because all of them passed equally well. So we must only renormalize all of these probabilities by a universal factor independent of the particular hypothesis H_i, a factor chosen to guarantee that the sum of P(H_i|E) over i equals one. The correct formula is thus:

P(H_i|E) = P(H_i) P(E|H_i) / P(E)
where P(E) is the normalization factor equal to the sum of P(H_i)P(E|H_i). Note that with this factor, the sum of the posterior probabilities P(H_i|E) equals one. Also, this number P(E) is equal to the weighted average of the conditional probabilities P(E|H_i) of the evidence E over different hypotheses. It is naturally weighted by the prior probabilities of these hypotheses which is why it is natural to call it simply P(E), the so-called marginal probability of seeing the evidence E.

Check that the formula has all the desired properties. If a hypothesis H_i predicts that E shouldn't occur, i.e. if the conditional probability of E given H_i, namely P(E|H_i)=0, then the posterior probability of H_i will also vanish. The hypothesis has been falsified. For the hypotheses where P(E|H_i)=1, we see that P(H_i) and P(H_i|E), the prior and posterior probabilities, only differ by the universal normalization factor which is what we wanted.

In reality, hypotheses often imply that the evidence E isn't sharply predicted or sharply impossible. Instead, a hypothesis H_i can predict E to have a probability P(E|H_i) between 0 and 1, given e.g. by the expectation value of a projection operator for the evidence E in quantum mechanics. It is natural to make the posterior probabilities P(H_i|E) linear in the conditional probabilities P(E|H_i). In other words, we can use the displayed formula above even if the conditional probabilities are generic numbers in between 0 and 1. Then it is the real Bayes' formula.

Everything clear?

Avoiding repeated evidence

You might ask why the relationship between P(H_i|E) and P(E|H_i) is linear. Well, the choice is natural because you can imagine that H_i is divided to equally likely "subhypotheses" H_ij, some of which (i.e. for some choices of j) have P(E|H_ij)=0 while others have P(E|H_ij)=1. In this setup, P(E|H_i) is the proportion of the subhypotheses H_ij of H_i with P(E|H_ij)=1. With this interpretation, the continuous Bayes' formula may be derived from the formula where P(E|H_i) are only allowed to be 0 or 1, assuming that you will choose the prior probabilities P(H_ij) of the subhypotheses to be independent of j.

But the question why the relationship is linear is a good one, anyway. For example, if someone incorrectly uses the same (or not at all independent) evidence E to refine his probabilities of hypotheses twice or thrice, he could end up with a quadratic or cubic relationship.

If someone uses the evidence 2500 times, as the IPCC does, the relationship will be a power law with the exponent equal to 2500. Posterior probabilities calculated in this way will effectively set the probabilities of all the theories H_i except for one that maximizes P(E|H_i) equal to zero. But such posterior probabilities are, of course, completely wrong. The corresponding logical fallacy that pretends that one or the most likely alternative (even among many) must occur is called the appeal to probability and it is the most frequent argument in all kinds of alarmism and paranoia. The correct relationship must be linear and only independent evidence may be used to refine the probabilities of hypotheses in the Bayesian reasoning.

(Global warming alarmism also uses many other kinds of logical fallacies such as argument from precedent - comparing the current climate with some events in the past -, fallacious slippery slope, the fallacy fallacy, bare asserion fallacy, the informal fallacy, if-by-whiskey fallacy about the quality of life in a warmer world, and dozens of other fallacies: see the list. But we don't have space here to cover all of them.)

Asymmetry of H and E

As we have repeatedly discussed in the articles about thermodynamics, H is not a mirror image of E in this framework. Well, let me be more careful. It is true that Bayes' formula can be written in the following H-E symmetric form,

P(H_i|E) / P(H_i) = P(E|H_i) / P(E),
which is why you should believe that its essence kind of remembers the underlying H-E symmetry that becomes the time-reversal symmetry if we use the formula for retrodictions.

However, the interpretation of different objects in the formula above is asymmetric. For example, the conditional probability P(E|H_i) is a sharply calculable prediction of the hypothesis H_i for seeing evidence E, for example the expectation value of a projection operator representing E in quantum mechanics (or a sum of squared amplitudes) with H_i as the initial state. It has an objective and unchanging meaning. On the other hand, the posterior probability P(H_i|E) is a subjective probability of a hypothesis after we have taken some evidence E into account. It has no objective or eternal meaning, especially because it depends on the priors P(H_i).

Most importantly, P(H_i|E) is not equal to P(E|H_i) even though some people incorrectly think that time evolution is time-reversal-symmetric even when the information is incomplete: this assumption would imply that the two quantities should be equal to each other, much like the squared absolute values of the inner products of "evolved initial" and "final" states in both orders. But they are not equal. The mistaken belief that they are approximately or exactly equal is so widespread that it has a name: it is called "the conditional probability fallacy". Mathematician John A. Paulos explains that the mistake is often made by highly-educated non-statisticians such as doctors and lawyers (and cosmologists such as Sean Carroll).

Analogously, P(H) and P(E) play a different role, too. P(H) is purely subjective - or it depends on previous data that have nothing to do with the new evidence E - while P(E) depends both on the subjective likelihoods of H_i as well as calculations of the evidence E in the different hypotheses.

The purpose of this short section was to repeat that retrodictions in physical theories are not canonical, unlike predictions. They always depend on priors. Once any kind of incomplete information occurs in your discussion - e.g. if you only study the system at the macroscopic level - retrodictions follow very different rules than predictions. That's why high entropy may be predicted in the future but not in the past. The people who still don't get this basic asymmetry are probably just too zealous or too stupid.

Bayesian inference and naturalness

But I also want to discuss other topics related to Bayes' formula. We will dedicate a few paragraphs to a simple question, namely the interpretation of naturalness. Naturalness in particle physics says that dimensionless parameters in the Lagrangian are expected to be around one. Is it a universal law of physics?

No, it is just a psychological expectation. Consider the QCD theta-angle. We know that a shift by 2.pi is physically inconsequential which is why the QCD theta-angle is a priori a number between 0 and 2.pi. If we don't know anything, we should assume a uniform probability distribution for this parameter.

Is such an assumption canonical? Nope. It is just a guess. For example, you could also think that a power of theta, not theta itself, has a uniform distribution which would be equivalent to a different distribution for theta. In this case, a uniform distribution for theta itself sounds more "intuitive" because it measures the volume on a moduli space once theta becomes a modulus but there is no hard proof that it is the correct one. Equally importantly, different sensible distributions that are uniform in simple functions of theta lead to the same qualitative conclusions.

What conclusions? Well, if we assume the prior probability P(H_i) to be uniform - in this case, we must clearly use a continuous setup of Bayes' formula where probabilities P(E|H_i) and P(H_i|E) become densities and sums over i must be turned into integrals over theta - the probability that theta is gonna be smaller than 10^{-9} is clearly smaller than 10^{-9} (over two pi). Even with slightly different distributions, the probability will be very small.

So we should be surprised. Of course that there is no contradiction here if theta is measured to be smaller than 10^{-9} as it indeed is. But the surprise strongly suggests that the prior probability is probably unrealistic. In other words, there must be some other, so far unknown and not quite random physical phenomenon or phenomena (for example, a new substructure of the particles and their quantum fields or a new symmetry that implies new cancellations, at least approximate ones) that make small values of theta (or zero) more likely. Once you understand these phenomena (perhaps axions, in this case) more correctly, your expectations for the distribution of theta will obviously change. If you are lucky, the strong CP-problem - the puzzle why is theta so surprisingly small - will evaporate.

Once we understood inflation, the huge size or mass of the observable Universe in Planck units also became less mysterious. There are many examples of this kind. Science is about making surprises less surprising, after all.

Incidentally, in the case of the theta angle, we seem to know that the anthropic principle can't be enough to show that theta is very small because life could probably exist for large theta, too. If someone claims that the anthropic principle clarifies all unnatural puzzles and hierarchies in current particle physics, this observation of mine pretty much falsifies his statement. The anthropic principle is not enough to make all small numbers sound natural. It can constrain others. But is it unexpected that some quantities are constrained by life and others are not? Can the tautological fact that a correct theory of the Cosmos must be compatible with life be used to derive anything non-trivial about the Universe? Which things can be derived and which things can't? How many of them should be derivable (clearly not all of them)? How do you decide in which of them the anthropic explanation is enough and which of them should get a better one? Is there any rationally justifiable answer here?

I am not aware of one. The statement by Nima et al. that the dimensionful parameters of the Lagrangian are more affected by anthropic arguments than the dimensionless ones is the closest thing to a rationally semi-justifiable observation I can think about.

Bayesian inference and the anthropic principle

Bayesian inference allows us to sharply distinguish what is our assumption - the prior probabilities - and what is actually being deduced from some evidence E. Some people use seemingly rational proportionality laws that they present either as results of some evidence E - which they are clearly not - or as justified priors - even though there exists no rational justification for such priors. These mistakes have been pointed out and corrected by many people, for example by Hartle and Srednicki in Phys Rev D, but many people still don't get it.

The first logical fallacy is called the "selection fallacy" by Hartle and Srednicki and it involves counting the more or less intelligent observers, the density of life, or counting the vacua in a class of stringy compactifications in order to find out which kind of background is more likely to describe the real Universe, assuming that we are "typical representatives" of a class. People often say that a class of stringy vacua is likely to be correct because it has many elements. Others say that universes with a high expectation value of intelligent observers or a higher density of life per galaxy are more likely.

Are these things scientific? In other words, do they follow from Bayesian inference?

For example, let us consider the most important example in which the existence of humans is the evidence E from the formula and we use it to refine our subjective probabilities of different stationary points in the landscape. Some people argue that the theories with a (very) large portion of the Universe occupied by intelligent life are (strongly) preferred. Is that true?

To answer this question, we must be very careful what E actually says. The evidence we have says that at least at one point of this observable Universe, there exists a human civilization. More concretely, the available evidence doesn't say that at a randomly chosen place of this Universe, one finds a human civilization. This is a very important subtlety. ;-)

Why is this subtlety so confusing? Because we may say that our planet is located at a random point of the Universe - a sentence that sounds correct and almost equivalent to the previous one. But the meaning of the word "random" is different than it was in the previous paragraph. The people who don't distinguish the role of the adjective in these two contexts are, in fact, making the very same error as the people who don't distinguish predictions and retrodictions.

When we say that our planet is located at a random place of the Universe, it means that we are not aware of any special properties of our Galaxy or the region occupied by the Solar System. If you pick a random galaxy and a random star in it, you end up with stars that are pretty similar to the Sun. That's why the sentence "We live at a random place of the Cosmos" is kind of correct.

But you don't end up with the Sun itself (you shouldn't forget to randomize your random generator!) which is why the sentence "At a random place of the Universe, you find humans" is obviously wrong. Random stars in random galaxies usually don't have life on them and even if they do, the creatures don't look like us. If there were humans with TV antennas orbitting nearby stars, we would have already detected many of them.

So it is very important to notice that there is no evidence behind the statement "At a random place of the Universe, you find humans" because in this sentence, you allow the other people to run their random generators and look at their random places. They will find no life over there. If you want to use life as evidence to refine your ideas about the validity of different theories, you must formulate E more carefully.

Once again, the justifiable statement is that "At least one star in the Universe is orbitted by a planet with humans". Furthermore, you may add another justifiable observation that the lively star looks much like many other stars. If a hypothesis H_i seems to predict that there are humans somewhere in the Universe, it doesn't matter how many civilizations or how high density it predicts. It simply passes the test.

If you deal with a theory of a multiverse or a class of string compactifications, which involve more or less well-defined sets of stringy backgrounds in both cases, the corresponding hypothesis really says that "Our planet lives in the universe that is correctly described by [at least] one vacuum in the corresponding set of vacua."

Again, this is the hypothesis we are testing. If you use the existence of life to decide which class of compactifications is more likely, the only thing that matters is whether at least one vacuum in the class is good enough to admit life similar to ours. Once a class of vacua passes the test, it just passes.

Whether life of our kind is predicted in a large percentage of the compactifications or a small percentage of compactifications is clearly irrelevant. If you go through the exact formulation of the hypotheses and evidence and use the correct Bayes' formula, you will see that I am right and the anthropic people are simply making a mistake. Their reasoning was too sloppy. There is no mystery here.

So why do I like heterotic vacua?

You might say that if I deny that an observed property of the Cosmos should better be generic in a promising class of vacua, I also undermine a reason why I believe that the heterotic vacua - that can pretty simply and naturally give the Standard Model gauge group within a grand unified framework - are more likely to be correct than the type IIB flux vacua. Some vacua in the type IIB set will have these properties, too, and I just said that one was enough. So isn't it a tie?

OK. So how do I formulate my thinking in the Bayesian framework? In this case, it is all about the priors. I simply believe that there exists a cosmological mechanism that makes "simple" vacua, with a proper definition of the word "simple", more likely to result from a cosmological evolution and more likely to survive various instabilities, inconsistencies, and dualities that will be discovered in the future. Or perhaps, a new theory of initial conditions will assign simple vacua a greater weight. Simple vacua are preferred much like low-lying states of a cool enough harmonic oscillator. Because of this reason, my prior probabilities are concentrated around the "simple" representatives of various compactifications, for example the heterotic compactifications with small Hodge numbers or braneworlds with small numbers of branes or small fluxes.

In the type IIB set, my prior is mostly located at too simple flux compactifications that simply do not give us the correct gauge group or the correct fermion spectrum. With this kind of reasoning, I end up thinking that the heterotic vacua that predict a pretty good physics "without much work" and with "specially looking manifolds" are actually more likely to be true than numerous vacua that are more generically incorrect. But I also realize that this conclusion depends on my prior belief in some kind of simplicity of the world, in Nature's tendency to choose special compactifications.

Constraints from a small cosmological constant

The cosmological constant is observed to be something like 10^{-123} in the Planck units. This observation is the main empirical evidence used to defend the anthropic ideas. Using Bayesian reasoning, does the small cosmological constant actually imply that we must live in a vacuum inside a dense discretuum i.e. a huge landscape of possibilities?

As usual, the answer depends on the priors. First, let us assume that the cosmological constant in any realistic, supersymmetry-breaking vacuum must be a random number whose distribution is peaked somewhere around the Planck density. Then, it is indeed unlikely for a string compactification to generate any region of space where the cosmological constant would be so tiny. The probability that at least one place or bubble has the right cosmological constant approaches one as soon as you consider an ensemble of 10^{123} vacua or more. That's why the anthropic people like the large landscape.

Imagine that you only consider a small set of candidate vacua, for example the 10+10+10+10 most beautiful heterotic, Hořava-Witten, G_2 holonomy, and F-theory vacua. What does the observed cosmological constant - the evidence E - tell you about the probabilities? Well, indeed, the uniform priors would imply that the tiny observed cosmological constant would make it unlikely for one of these 40 theories to be correct.

However, my prior is not uniform. I think that there can exist many potential mechanisms such as the cosmological seesaw mechanism that make small values of the cosmological constant pretty natural. I am not certain about the existence of such a mechanism but I assign a non-negligible probability to its existence. This nonzero probability therefore influences my (inaccurately known, in this case) conditional probabilities for various theories to generate various values of the cosmological constant so that a small cosmological constant is simply not astronomically unlikely anymore: there is a finite "tail" near zero. With these assumptions, I don't need a huge set of possibilities.

What I say should have been expected. Whether or not you need a huge landscape depends on your beliefs. If your priors reflect your belief that there can't exist any mechanisms or alternative calculations making small lambda likely, a huge landscape of 10^{123}+ vacua is almost necessary. If you believe that there is a chance that a more detailed calculation can actually show that the cosmological constant likes to be small, the huge landscape is not needed.

If the correct answer is that there are way too many vacua and we live in a rather generic one, it still doesn't tell you much about other questions. For example, even if you know that the cosmological constant (or the number of dimensions of space) has an anthropic explanation, it is no free ticket for the anthropic explanations to spread.

Whether or not the strong CP-problem is explained by having many vacua is a new question, unrelated to the cosmological constant. And the answer to this question is almost certainly that the right explanation is non-anthropic, e.g. axions. These answers - whether the anthropic explanation is relevant for some question - primarily depend on something else than the universal, religious power of the anthropic principle. Quite on the contrary, these answers depend on the existence of deeper and more accurate explanations for the individual features of the Universe. Every physically independent question is a new one.

Life should be likely: but how likely?

You often hear that theories that predict that our life is more likely are more acceptable than the theories that predict that our life is much less likely. Indeed, this is a correct principle. In Bayes' formula, the theories that probably lead to life have a higher value of P(E|H_i), where E stands for life, which increases P(H_i|E), too.

However, once again, you must be very careful what the probability P(E|H_i) means. It is the probability that life E emerges somewhere - in at least one region - in the observable Universe predicted by the theory H_i. If your theory H_i predicts that a huge fraction of stars have life, it doesn't increase its posterior probability P(H_i|E) simply because the high density doesn't matter since there is no evidence for such a high density! One lively planet is good enough. You can't choose a probability P(H_i|E) greater than one.

If you imagine that a theory predicts a spatially infinite Universe, you could protest that such an infinite Universe will inevitably generate humans somewhere and my prescription assigns an unfairly high probability to such a theory. You might think that such a theory should be punished for predicting a very low density of life. I disagree. One planet predicted by such a theory where the phenomena look just like the phenomena observed from the Earth and follow the same patterns and relationships is simply good enough for the theory to pass the test of life.

In this context, you should notice that a theory that produces Boltzmann's brains in a spatially and temporally infinite Universe may also pass the test of life but it fails other tests. Indeed, life can emerge somewhere in a spatially and temporally infinite Universe in the form of Boltzmann's brains. So the conditional probability P(E|H_i) where E is life and H_i is a theory with the infinite Universe is equal to one and the theory is not punished by the observed existence of life at all, whether or not the theory predicts burning stars!

On the other hand, the binary fact about the existence of life is not the only evidence that can be used to refine the probabilities of various theories. Additional evidence implies, among billions of other things, that the observations E_2 from many telescopes are consistent with an ordered Big Bang cosmology. The probability of such an outcome predicted by Boltzmann's brains is something like exp(-s) where s is the number of data points ever measured in science. ;-) It is this evidence - the observed order of the real world that seems to make sense - that effectively rules out Boltzmann's brains as a correct explanation. But the observed existence of life itself is simply not constraining enough to do so!

Some people might just correctly want to punish Boltzmann's brain theories but they don't determine the correct reason why we know that these theories are very unlikely. The reason is not the known existence of life or a low density of life predicted by those theories but the observed order of our empirical data that is predicted to be very unlikely by every Boltzmann's brain theory.

Summary

Everyone is recommended to learn the formulation and a proof of Bayes' formula and use them carefully whenever there is a controversy about the calculation of some probabilities, especially if the differences between the opinions of people about some probabilities become exponentially huge and whenever there is a dispute about the difference between assumptions and the insights obtained from the evidence.

Once you do it, many arguments may be shown to be simply wrong while others might be shown to be nothing else than an encoded version of the author's preconceptions, preconceptions that are supported by no evidence. Technically, the latter mistake is based on choosing exponentially small priors for sensible (and probably true) theories.

Conclusions that the young Universe had to have a high entropy; that a scientific theory predicts that we should be Boltzmann's brains; that classes of vacua are better if they produce bigger Universes with denser life or if the class of compactifications are very numerous - all these conclusions may be sharply identified as results of faulty or sloppy reasoning, incorrect versions of Bayes' formula, misinterpretation of the hypotheses or the available evidence, or illegitimate choices of prior probabilities that suppress the correct answers a priori.

And that's the memo.

Bonus: craziness of Bousso et al.

While Hartle and Srednicki are not only right but also win the citation-count battle in this typicality discipline, there are still people who disagree with their (and Bayes') obviously correct rules.

For example, Bousso, Freivogel, and Yang argue on page 1 of their bizarre paper about Boltzmann babies that Hartle and Srednicki's rule that we are not allowed to assume our civilization's typicality implies that we can't deduce anything from our predictions and that science as we know it is impossible. In other words, the anthropic principle is a pillar underlying all of science. Wow. ;-)

In their thought experiment, a theory T1 predicts the electron in your lab to have spin up with probability epsilon (much smaller than one) while T2 predicts spin down with probability epsilon. If you measure the spin to be up, T1 is pretty much falsified while T2 is confirmed.

Bousso et al. claim that one can't make this conclusion in the Hartle-Srednicki setup. Why? Because - hold your breath - we should actually compute the probability that the spin is up in at least one laboratory of the Universe predicted by T1 and this probability is not epsilon but X = 1-(1-epsilon)^L where L is the number of labs in the Universe and this number X is effectively equal to one for very large L, leading to the opposite conclusion than the correct one.

The conclusion by Bousso et al. is of course complete rubbish. When T1 predicts P(up)=epsilon, it is a probabilistic prediction that applies to every single lab in the Universe with the same initial conditions. It holds for typical labs as well as atypical labs, labs led by men and women, liberals and conservatives. In fact, the free will theorem guarantees that the electrons randomly decide according to the statistical predictions and they are not affected by the lab in which they live or any of the data in its past light cone: you can't really divide the labs to typical ones or atypical ones because all the electrons are free and their random decisions are unaffected by their environment (e.g. hidden variables that are thus forbidden).

By the way, it is useful to have many labs or many copies of the experiment if you want to measure the probabilities more accurately. Bousso et al. argue that according to the Bayesian reasoning, having many labs makes things less conclusive which sounds as a complete madness to me. I don't even know what confusion leads them to this conclusion so I can't discuss it. But having many labs is a different topic and I want to talk about the single-lab setup only.

When you say that T1 predicts P(up)=epsilon in your lab, you don't need to be making any assumption about your lab whatsoever, except for the assumptions and initial conditions that were used to calculate the result. The theory has already made the prediction for P(up) and it was epsilon. The statement that "at least one lab in the Universe saw the electron spinning up" is a completely different statement than the statement that "the electron in your lab - or any other one concrete lab - is spinning up". Raphael et al. seem to mix up these two different statements.

What does the evidence in the two cases actually say?

Because I don't genuinely believe that they're so confused that they don't distinguish these two clearly different statements, I think that the reason of their confusion must lie elsewhere. I think that they actually misunderstand which of these statements has been empirically justified in the two situations (spinning electron vs life in the Universe).

If we measure the spin to be up, we have actually proven the statement that "the electron spin in our particular lab is up". More concretely, it is the same lab for which we have defined the initial conditions. Quantum mechanics was able to link the initial state of this particular lab with the measurements of the spin in the same lab. It doesn't matter which lab in the Universe it was. The important thing is that we are still talking about the same lab.

If Raphael et al. use the initial conditions in the lab No. 2008 and use quantum mechanics applied to T1 to deduce that at least one lab in the whole Universe will see spin up with probability epsilon, they are just using quantum mechanics incorrectly. If they use some combined average information about all labs in the Universe and deduce something about a particular lab, they are making a similar mistake. If they use the same lab both in the initial and final state but they end up with the probability 1-(1-epsilon)^L, they are making a mistake, too. The laws of the realistic quantum mechanical theories are local and only allow us to predict the measurements in the same lab whose initial conditions had to be inserted to the machine to calculate the theoretical predictions. And such a result is independent of other labs and their number.

But the situation with the counting of life on planets (or in the universes) is different. Should an easily acceptable theory predict many planets with life? The answer is a resounding No, as explained above. Where is the difference from the spinning electron thought experiment? The difference is that the possible hypotheses or initial states that we are comparing in the case of life are no longer the initial states of a single lab or a single planet but the possible initial states of the whole Universe.

The whole Universe has no special relationship with any of its planets. So there is a dramatic difference here. In the case of the electron, we have measured the spin to be up in the same, special, marked lab whose initial conditions were used to derive the prediction. T2 correctly predicted the spin to be probably up but the probability was a conditional probability given the assumption that the same lab had certain initial conditions.

On the other hand, in the case of the planets in the Universe, we observe life on at least one, arbitrary, unmarked planet of the Universe but this planet is in no way connected with any special region of the Universe included in the initial conditions or in the defining equations of the theory and the corresponding probability of life is not really a conditional one.

So when we observe life, we observe "life on at least one planet", while when we observe the spin up, we observe "spin up in exactly the same lab whose initial conditions helped to define the very problem". In other words, the quantifiers are different. In the case of life, the empirical evidence only implies that "there exists" at least one planet with life. In the case of the spin, the empirical evidence implies something different and kind of stronger, namely that "in the same particular lab that was talked about when we defined the initial conditions of the problem, the spin was up".

In the case of the electron, the initial state of the same lab was a part of the conditions in the conditional probability P(up|conditions) predicted by T1 and this fact makes a huge difference. When we make the theoretical calculation of the observed existence of life, we mustn't make any a priori conditions about the planet where the life would be going to be observed: the probability is not really conditional.

If you wanted to defend the statement of Raphael et al. about the typicality of life, you would need a different sort of empirical evidence. You would need to show that there is life on every or almost every planet of this Universe. You would need evidence that our Universe has the property that when you start with a planet, you end up with life with a high probability. Or you would at least need to show that the density of lively planets is high. This is tautologically the evidence that you need to argue that a theory should better predict many copies of life in the Universe and we obviously don't have any such evidence because we only know one lively planet so far.

To understand the difference between these two things is a kindergarten problem that a kid should be able to figure out in a few minutes. Nevertheless, Raphael, despite his bright mind, has clearly been struggling with this triviality for years to no avail. It seems kind of amazing and Bayesian reasoning indicates that because he couldn't have figured out these basic things for years, it is unlikely (P < 1/(365 x 5)) that he will do it by tomorrow. But I still hope he will! ;-)