Saturday, August 02, 2014

Sleeping beauty in Guantánamo Bay

If Alice could have been to Wonderland, why shouldn't the Sleeping Beauty visit Gitmo?

I've concluded that the elementary mistakes that lead some people to say that the correct answer to the Sleeping Beauty Problem is \(P=1/3\) is the primary cause of these people's totally invalid claims about the arrow of time as well as the anthropic principle as well as their irrational fear of the Boltzmann Brains.

When I was a kid and as recently as 15 years ago, I wouldn't feel that there exist similar controversies about similarly trivial issues. I swear that various mathematical olympiads would solve analogous problems, many people presented wrong answers but the organizers and otherwise "adult" people who were expected to know agreed what the right answers actually were, and they would generally agree with me. These days, we hear lots of self-evidently wrong things from pundits who should know better.

Before we get to Gitmo, we should review the original problem.

Let me remind you about the problem.
The sleeping beauty is explained the rules of the game – this paragraph – on Sunday morning. On Sunday afternoon, they toss a coin. The sleeping beauty is put to sleep for a week on Sunday night. But she is woken up once or twice, depending on the result of the coin toss. If it came up "tails", she is only woken up on Monday. If it's "heads", she is woken up twice, on Monday and Tuesday. When she wakes up, she remembers these rules of the game but her memory about the wakeups had been erased so she doesn't know whether she had been previously woken up. And she is not told about the day of the week, either. The question is what is her correctly calculated subjective probability that the coin is showing "tails" at the moment.
The correct answer is obviously \(P=1/2\). Both results of the coin toss are equally likely, \(P=1/2\), and she knows that this is how the coin works. Both of these hypotheses predict that she would be woken up. When she learns that she woke up, she only knows that she would have woken up at least once. But both hypotheses, "tails" and "heads", guarantee that this observation would take place. So the observation gives her no information whatsoever. Bayes' theorem guarantees that her posterior probabilities therefore remain the same after the observation, \(P=1/2\) both for "heads" and "tails".

The people who want to defend \(P_{\rm tails}=1/3\) like to imagine that one throws some marbles or fruits into a jar whenever she is woken up. See e.g. "another experiment" by Cristi Stoica. After a long time, the number of heads-fruits in the jar will exceed the number of tails-fruits by a factor of two which they interpret by saying that the probability that she wakes up after "heads" is twice as high.

They completely miss the obvious problem, the sampling bias. The fruits are more likely to be thrown to the jar if they're heads-fruits. She knows about this sampling bias as well. It's a bias, a mistake, and it has to be removed if you want to determine the actual probabilities of the underlying phenomenon – the coin, in this case. She is asked about the probability of a statement about the coin, not about fruits, so if she finds fruits helpful to answer a question about the coin, she must do it correctly. She demonstrably knows about this sampling bias as well – there are two heads-fruits added to the jar in the case of the "heads coin toss" – so she has to take it into account. It means that she must divide the number of heads-fruits by two, and then interpret the ratios of fruits as the ratio of probabilities. When she does it right, of course that she will determine that the probability is still \(P_{\rm tails}=1/2\).

Instead of fooling herself by the "same-size fruit" for differently likely events, she – if she wants to avoid later calculations – could directly throw half a pound of brown butter into the jar whenever she wakes up after "heads", and one pound of yellow butter if she wakes up after "tails". Then, after a long time, the relative amount of the butter of different colors could be directly interpreted as the probability ratio, and be sure that "tails" and "heads" would be equally likely once again, therefore \(P=1/2\).

All the "thirders", defenders of \(P=1/3\), seem to be very sloppy so no one has actually presented a clear argument why the answer should be \(P=1/3\). Let me try to fill this gap.

A Bayesian derivation of \(P=1/3\)

There are 7 days in the week and 2 possible results of the coin toss. It means that we have 14 competing hypotheses \(H_{cd}\) describing both the "state of the coin" \(c\) "today's day in the week" \(d\). The hypotheses are Monday-tails, Monday-heads, Tuesday-tails, Tuesday-heads, and so on.

By the \(\ZZ_7\) symmetry between the days and \(\ZZ_2\) symmetry between the sides of the coin, and by the a priori independence of these two quantities, we may argue that the probability of each combination is \(P_{cd}=1/14\), OK? Let me write the full table of prior probabilities for you.


Now, when she wakes up, only 3 possibilities out of the 14 survive while the remaining 11 combinations are eliminated, right? So the table after this "collapse" – after a step of the Bayesian inference – seems to be


These are the conditional probabilities \(P({\rm wakeup}|H_{cd})\).

The denominator in Bayes' formula only means that we have to uniformly renormalize all the probabilities of the 14 hypotheses to guarantee that they sum up to one. That means that we must adjust the probabilities above to


The table shows the conditional probabilities \(P(H_{cd}|{\rm wakeup})\).

At any rate, the three surviving arrangements, "heads-Monday", "heads-Tuesday", and "tails-Monday", seem to be equally likely. We may sum them up which means that after she learns that she was just woken up, the probability of Monday is 2/3, the probability of Tuesday is 1/3. The probability of heads is 2/3, the probability of tails is 1/3.

It sounds fairly good, doesn't it?

Maybe but what's more important is that it is wrong. Let's look what Bayes' formula actually quantifies the posterior probability of the hypothesis as\[

P_{t=PWI}(H_{cd}|{\rm wakeup}) = \frac{ P({\rm wakeup}|H_{cd}) P(H_{cd}) }{ P({\rm wakeup}) }

\] where \(c\in \{{\rm heads}, {\rm tails}\}\) is the state of the coin and \(d\) is one of the seven days of the week (the proposition is "today is \(d\)"). The denominator is just a normalization factor not dependent on the hypotheses that is calculated so that the sum of the hypotheses remains equal to one after we perform one step of Bayesian inference.

An important detail is that I added the subscript \(t=PWI\) which means that these are subjective probabilities evaluated right after a "particular wakeup incident" (PWI). You know, subjective probabilities depend on time – that's why we have the "inference". It is a bit dangerous to parameterize them by some "subjective time" but it is nevertheless possible to do so, if we are careful.

The prior probabilities \(H_{cd}\) are all equal to \(1/14\) in our case. I think that this is a claim that the "thirders" would endorse because the democracy between these 14 options is really the main "experience" that impresses them into thinking that the surviving 3 options are equally likely, too.

OK, the last piece of the story that matters are the probabilities is\[

P_{t=PWI}({\rm wakeup}|H_{cd})

\] which is the probability of the "wakeup" predicted by one of the 14 "hypotheses" (coin-day arrangements) \(H_{cd}\). I've said that it's obvious what the "thirders" assume about the value of \(H_{cd}\): it's equal to \(1\) for the three surviving options and \(0\) for the remaining eleven options. That's why they believe that the probabilities of the three surviving options remain the same.

I am keeping the \(t=PWI\) as the subscript to indicate "when" the probabilities are evaluated.

However, we must avoid sloppiness if we want to calculate these "predicted" conditional probabilities – and therefore also the final result of the "sleeping beauty problem" – correctly. I have deliberately been sloppy so far because I wanted to emulate a thirder (after his IQ was increased by 10 points or so; I am not a good enough actor to play truly dense people).

What does the evidence in the conditional probabilities, "wakeup", really says if we are careful and not sloppy? The actual evidence that she can extract from being woken up tells her that
I was woken up today, i.e. on an unknown day. It isn't guaranteed that the current time \(t=PWI\) is equal to a particular day \(d\).
When she is woken up, she doesn't really learn any particular unambiguous information about the "day" that is today. So her opinion about "the day today" doesn't necessarily have to agree with a particular value of the index \(d\) of a hypothesis we evaluate.

You might think that I am just being talkative or picky and that the "thirders" realize that and take that correctly into account. However, they don't. This clarification makes a lot of difference. It's all the difference you need to correct the wrong result \(P=1/3\) and replace it by the right result \(P=1/2\).

How does it work? Let me write the full table of the conditional probabilities – the predictions for the weakeup – because the tables look elegant given the ease with which I learned how to press CTRL/C and CTRL/V. :-)

\(P_{t=PWI}({\rm wakeup}|H_{cd})\)HeadsTails

These are the correct probabilities. After you normalize them to switch the order of the things in the conditional probability, the table becomes

\(P_{t=PWI}(H_{cd}|{\rm wakeup})\)HeadsTails

The probability of "tails" is \(P=1/2\) while the two wakeup days share the remaining \(P=1/2\) reserved for the "heads".

OK, let's return to the previous table with the entries \(1/2,1/2,1\) and justify it. The hypothesis "tails Monday" predicts the "wakeup on Monday" with the probability 100%. It's really the uncontroversial part. The part where the "thirders" would be doing their mistake if they were able to present a Bayesian derivation at all is in the values\[

P_{t=PWI}({\rm wakeup}|H_\text{heads,Monday}) &= 1/2\\
P_{t=PWI}({\rm wakeup}|H_\text{heads,Tuesday}) &= 1/2

\] Their wrong assumption leading to the value \(P=1/3\) for the sleeping beauty problem is that both of these probabilities are equal to one. Why are these predicted probabilities equal to one-half and not one?

It's because
the hypothesis "heads Monday" (just like "heads Tuesday") predicts that there is a wakeup both on Monday and Tuesday – these two wakeups are logically connected, not mutually exclusive – so even if we assume that "today is Monday", there is a 50% probability that \(t=PWI\) is Monday and 50% that \(t=PWI\) is Tuesday.
Both "Monday heads" and "Tuesday heads" hypotheses share the possibility that she wakes up and \(t=PWI=\) Monday or Tuesday, so these predicted conditional probabilities are 50%.

Let me get to Gitmo.

Guantánamo Bay twist

Let's modify the problem to make it more extreme. She is still woken up once or twice but there is a new effect:
With the probability 1/5,000, i.e. approximately once in 100 years, the sleeping beauty is accused by CIA of having drunk the Siberian Crown, a Russian beer, with Agent Mulder. The CIA Heads Office therefore storms the building and transfers the woman to Guantánamo Bay.

They don't wake her on Monday, they don't wake her on Tuesday, but they wake her up on Wednesday. 50,000 times, approximately once per second. They erase her memory before each incident.
Now, sometime during the week, the woman wakes up again. What is the correctly calculated subjective probability that the coin came up heads? Tails? CIA?

If you monitor the system for a long enough time (assuming her longevity etc.), a vast majority of the wakeup incidents will be wakeups by the CIA. After 5,000 weeks, there will be 50,000 CIA wakeups on Wednesday in average and only below 5,000 regular wakeups after "heads" and below 2,500 regular weakeups after "tails".

Using the marbles or fruits, we see that the CIA scenario dominates.

Imagine that you are in the position of the sleeping beauty and you will wake up sometime in the week. Do you really believe that it follows that you have probably been captured by the CIA?

I think that if you are not a conspiracy nut and a whackadoodle, you realize that the CIA scenario is very unlikely – it occurs once in a century – so you would have to be very unlucky to become a victim of that. The fact that you are woken up 50,000 times can't really change this fact. These 50,000 wakeups just share a predetermined, reserved, "small piece of the probability pie".

Effects with extremely low probabilities – e.g. that they're expected not to occur once since the Big Bang – may simply be neglected even if you associate them with marvelous, huge, far-reaching impact. It is common sense. You can't inflate the likelihood of a (conspiracy) theory that looks extremely unlikely by decorating the (conspiracy) theory with lots of big numbers what happens when it happens.

One might argue that this "fallacy of the thirders" is an important driver behind people's beliefs in the Boltzmann Brains, claims that statistical physics doesn't explain the arrow of time, and even global warming.

The Boltzmann Brains (or at least their worshipers) think that just by inventing quadrillions of people who live "somewhere" and who are similar to you implies that it becomes likely that you are one of them. But it doesn't because there is no real "democracy" or "equivalence" between you and random elements of that ensemble. Equal probabilities are only justifiable if there exists an argument, e.g. symmetry argument. But we're qualitatively different – at least by our environment – from the Boltzmann Brains so there is no reason why the two theories about our location should be equally likely. It's perfectly sensible – and I think it is right – to say that the theories that argue that "we are Boltzmann Brains" are extremely unlikely, indefensible, so it doesn't matter a single bit that the people in those theories look exactly like us. The theory "we are Boltzmann Brains" itself has a small probability to start with, and it may at most divide its "small pie" to sub-theories.

For a positive example of the symmetry, in the Sleeping Beauty Problem, there is a symmetry \(\ZZ_2\) between the two sides of the coin, and that's why these two sides are equally likely. Similarly, in the "heads" case, there is a symmetry between Monday and Tuesday, which is why these two days are equally likely assuming "heads". When combined, the probabilities are 50%, 25%, 25% for the three arrangements. Just because the three combinations look similar, the "thirders" assume a non-existing \(\ZZ_3\) or \(S_3\) symmetry between "Monday heads", "Tuesday heads", and "Monday tails" to claim that the probabilities are \(1/3\). But there is no symmetry which is why there is no reason to think that the probabilities are the same.

Of course, people who believe that there is a "paradox" in the low entropy of the early Universe apparently suffer from the same "fallacy of the thirders". The idea is that all the microstates are equally likely as initial states of the early Universe, so the early Universe should be expected to have had the maximum entropy. But it's simply not true that initial microstates are equally likely. The only justification why microstates are sometimes equally likely is based on thermalization and ergodic theorem. If you let the system evolve chaotically for a while, transfer the heat and so on, things get balanced and every basis vector or point of the phase space that could have been reached was reached with the same probability. Of course that we know for sure that they are not the same at the beginning. The initial state of any physical system including the whole Universe is a special one, one with a low entropy, not a high-entropy generic microstate. This fact contradicts nothing that science has found. Instead, it is an implicit corollary of the second law of thermodynamics. To say the least, the second law would be vacuous if one focused on the case where the maximum entropy is reached at all times. It's not.

But note that this conclusion requires the process of thermalization – process when the entropy is actually increasing and the non-uniformities are being smeared out. If you don't wait, i.e. if you talk about the very initial state of a physical system, there has been no time for this thermalization, and there is therefore absolutely no reason to think that the different microstates are equally likely.

The anthropic people believe in the fallacy that the probability of a particular theory may be increased by an arbitrarily high factor if the theory predicts a large Universe or a Universe with very many stars. Sometimes they think it's enough for the Universe to be long-lived because it's the integrated population that matters etc. I don't want to discuss the details of these mistakes and which of them are mistakes because most of the details are unrelated to the Sleeping Beauty Problem.

I added the "global warming" believers for two unrelated reasons.

One of them is Pascal's Wager. Some people think that they can make the ludicrous global warming fears "more important" by inventing a "bigger impact" of the global warming catastrophe. Perhaps, the whole planet will completely burn out. So even if it is absurdly unlikely, it would be so huge that the expectation value of damages is still gigantic.

If you actually calculate the expectation value correctly, you get a small number because no global destruction is really possible in practice. But the calculation for realistic threats is not the point here. The point is that if you talk about catastrophes that are likely to happen once in a trillion of years, it just doesn't matter whether their impact is huge or not. Effects of such low probabilities may simply be assumed not to exist. People whose thinking is overwhelmed by possible events that occur less than once in a trillion of years are nuts.

The other reason why I added the "global warming" believers is that many of them love sampling bias. If you ask Michael Mann and dozens of his "peers" about "global warming" many times, you may find out that 97% of the answers support the "global warming" fears. But that doesn't mean that these fears are justified. It doesn't even mean that most of the competent people share the fears (which would still not be sufficient to conclude that the fears are justified). You have been just asking people in a biased sample. Even if you count people who write into some standardized climate science journals, they are still a heavily biased sample – and they aren't really among the most competent people on the planet to discuss such matters rationally or scientifically. The biased sample is a big defect, one that you have to fix if you want to deduce sensible and defensible conclusions.

So the "thirders' fallacy" may be the pseudointellectual Urquell of many kinds of conspiracy theories by various types of whackadoodles who believe that the arguments that "something is extremely unlikely" may be "beaten" (and perhaps, the probabilities themselves may be rewritten and inflated) by inventing far-reaching fantasies "how important it would be if the unlikely thing were true" or simply by repeating (or forcing you to observe) one picked answer many times.


  1. Good analysis IMO, but I can see the dummer pobel saying they interpret the problem and its framing differently. They
    are wrong. It is not the Monty Hall problem.
    Also, changing horses for a moment, not only are we not Boltzmann brains, but they most likely don't exist as well.
    In an infinite universe or multiverse, not every possibility has
    to eventuate---some "possibilities" could have 0% probability of occurring. There are "smart" thought experiments of the Einstein variety, and there are "dumb" thought experiments of the XXXX (fill in name) variety.

  2. The Monty Hall Paradox is contingent upon knowing one of the outcomes. The coin flip has all unknown outcomes. The coin flip is not the Monty Hall Paradox.

    If one of your postulates is empirically defective, your rigorously derived axiomatic system is no better when it is contingent upon that postulate.

  3. True, Gordon.

    If I decorate your statement, some possibilities may fail to eventuate *even* if they contain two creatures that are identical to us, electron by electron. If these observers are there and the same as us, it doesn't mean that the possibility is equally likely as the possibility we know to be true. ;-)

    I actually forgot to mention the "many worlds". A key mistake in the many worlds is to think that the "two worlds" are equally likely just because the two options look equally real. They may look equally real as options but the whole point of the concept of probability is that the probabilities don't have to be equal to each other and to 1/2 or 1/N each.

    The Monty Hall problem has some other subtleties, about the behavior of the host and assumptions about that, and so on. These complications are avoided in the Sleeping Beauty Problem. On the other hand, the "exactly repeated experience" is avoided in the Monty Hall Problem so I wouldn't agree that the problems are equivalent in any way.

  4. Dear Uncle Al, I would agree that the problems aren't equivalent. But there is a factor of chance in the Monty Hall paradox, too. The door where the prize is hidden has also been randomly chosen, wasn't it?

  5. Dear Lubos,
    let's say the beauty is offered 100 Euros for each time she is woken up and guessing right. If she says Tails always she has a 50% chance of winning 100 Euros. If the says Heads always she has a 50% chance of winning 200 Euros. I think this is what Prof. Polchinski meant by defining probabilities with betting. I think the CIA example works very similar. If you bet on the CIA you most likely don't win anything. But you have a small chance of winning a huge amount of money. You can adjust the number so that betting on the CIA gives you a higher expectation value for the money you win. It is just that the expectation value is not what you care about here.

  6. Right, but the unit of your profit is one dollar so it clearly means that you can't interpret the profit from a particular bet as a probability.

    If you write the formulae correctly, you will be able to derive from the profits that the probability is 1/2, even if you buy 2 shares in the "heads" weeks. See e.g. this particular DISQUS comment

    Search for "The expectation value of the benefits is" at