Friday, December 30, 2005

Bayesian probability I

Two days ago, we had interesting discussions about "physical" situations where even the probabilities are unknown.

Reliable quantitative values of probabilities can only be measured by the same experiment repeated many times. The measured probability is then "n/N" where "n" counts the "successful measurements" among all experiments of a certain kind whose total number is "N". This approach defines the "frequentist probability", and whenever we know the correct physical laws, we may also predict these probabilities. If you know the "mechanism" of any system in nature - which includes well-defined and calculable probabilities for all well-defined questions - you can always treat the system rationally.

Unknown probabilities

It is much more difficult when you are making bets about some events whose exact probabilities are unknown. Even in these cases, we often like to say a number that expresses our beliefs quantitatively. Such a notion of probability is called Bayesian probability and it does not really belong to exact sciences.

Vendor machines

For example, you pay three quarters to a vendor machine to get Coke for 1 dollar. The third quarter is swallowed but not counted. Should you try to throw two (or more) quarters to the same vendor machine, or should you rather choose the next machine which is more likely (and let's assume that it is guaranteed) to work correctly but where you will have to pay 4 more coins?

If you knew the probability that the first machine is gonna steal your coins - for example, imagine that someone told you that the machine steals every coin with probability P, independently of others - you could solve this problem mathematically and calculate which strategy has a lower expectation value of the money that you will have to pay in total.

However, you don't know P. Because the machine has stolen one quarter among three, you may think that the probability of a "theft" is around 1/3. With the number P=1/3, you may again derive the correct answer. (In fact, taking the risk and continuing with the unreliable machine is cheaper in average.)

But if you were just "lucky" that only one coin was stolen, the probability that the machine will steal your money can be close to one. In this case, abandoning this machine is clearly cheaper. An important conclusion is that there is no canonical way to determine the probability that the "theft rate" of the vendor machine is between P and P+dP. (I used the words "theft rate" because once you interpret this observable as a permanent characteristic of the machine, it is no longer a probabilistic observable.)

Imagining the distribution for the theft rate

You may imagine that the distribution of P is Gaussian, with a center around 1/3 and with width determined by the fact that you have only made 3 measurements. But it is very important how this distribution behaves near P=1. If it were non-zero near P=1 (like the Gaussian), the expected average value of the coins you will have to pay would actually be logarithmically divergent.

You actually know that P cannot be one because two coins have been counted correctly. But even if the probability distribution goes to zero near P=1, but just terribly quickly (so that it is pretty big even if you're very close to P=1), you may still obtain a divergence.

This is where rational thinking ends and religion starts. You simply can't know whether these extremely unlikely events are just very unlikely or absurdly unlikely. And depending on the probability that the theft rate is close to one, you will obtain different conclusions about the optimal strategy.

Insurance and averaging

Whenever we talk about phenomena that occur many times and whose losses as well as benefits are "minor" relatively to what we can afford, the expectation values are the only truly "rational" measurement of the quality of different decisions. For example, a billionaire would be stupid to buy all tickets in the lottery because he knows that 15 percent of his payment (or another percentage that can be calculated almost exactly) would go to the company that runs the lottery.

Such a billionaire would be stupid but he would still be incredibly less stupid than a country that codifies the Kyoto protocol.

In a similar way, a millionaire does not need an insurance against many "minor" things because even in this case, he can calculate pretty accurately what percentage of his payment will be swallowed by the insurance company. On the other hand, a millionaire can also afford to pay the insurance even if it statistically means a loss for him. Millionaires can afford to behave irrationally, in all possible directions.

Huge lotteries and critical insurance

But when you are thinking about the insurance against an event whose impact would be devastating - or if you are thinking about a lottery where you can win large amounts of money that can "solve everything" - it is clear that the rational thinking and expectation values become less important.

Similar issues were relevant when we were thinking about "betting on the climate" and the two sides had vastly different ideas what are the probabilities of different events. One party thought that the probabilities were 50:50 while the other party thought it was closer to 99:1. In this case, once again, we don't know what is the true probability. Any assumption in between these two is statistically attractive for both parties. I mentioned that the geometric average of the two ratios - close to 90:10 - looks as the fairest assumption but there is no way to justify this convention or other conventions simply because the true probability is unknown.

Once you agree about the probability, the rules for your bet are, of course, defined by requiring that the statistical average of the amount of money that is won/lost by either party is zero.

Also, when we predict the death of the Universe or any other event that will only occur once, we are outside science as far as the experimental tests go. We won't have a large enough dataset to make quantitative conclusions. The only requirement that the experiment puts on our theories is that the currently observed reality should not be extremely unlikely according to the theory. For example, the lifetime of our Universe should never be predicted to be much smaller than 14 billion years because it would be surprising to see that we are still here.

Which probabilities are scientific

While the text above makes it clear that I only consider the frequentist probabilities to be a subject of the scientific method including all of its sub-methods, it is equally clear that perfect enough theories may allow us to predict the probabilities whose values cannot be measured too accurately (or cannot be measured at all) by experiments. It is no contradiction. Such predictions are still "scientific predictions" but they cannot really be "scientifically verified". Only some features of the scientific method apply in such cases.

1 comment:

1. You said:

It is much more difficult when you are making bets about some events whose exact probabilities are unknown. Even in these cases, we often like to say a number that expresses our beliefs quantitatively. Such a notion of probability is called Bayesian probability and it does not really belong to exact sciences.

I have seen this style of argument against Bayes before. It wrongly assumes that Bayes is about defining probabilities rather than manipulating them.

The Bayesian approach is about doing inference by manipulating joint probabilities in specific ways (e.g. Bayes theorem). Bayes doesn't really care where its prior probabilities come from, and the approach does not actually define a method for supplying these priors to bootstrap the inference process.

In contrast to your claim above, Bayes is perfectly happy to use experimentally measured frequentist probabilities as priors to bootstrap its inference process, but there are more rigorous ways of using such frequentist data in the Bayesian approach.

The following paper gives a nice axiomatic approach to Bayesian inference, which you might find interesting:

Cox R T, Probability, frequency and reasonable expectation, Am. J. Phys., 1946, 14(1), 1-13.

Essentially, the paper shows that the Bayesian way of manipulating joint probabilities is the only consistent way of doing inference. I have not got a copy of the paper to hand, but I seem to recall that it doesn't mention the word Bayes anywhere in it!