Monday, June 17, 2013 ... Français/Deutsch/Español/Česky/Japanese/Related posts from blogosphere

All proofs in natural and social sciences ultimately depend on probabilities

Mel B. has sent me a link pointing to a rather incredible attack by an economics professor on the statistical methods in science that was published in the Financial Post:

Junk Science Week: Unsignificant statistics
Stephen Ziliak doesn't want to believe the existence of the Higgs boson – or any other "proof" in science that is based on the notion of statistical significance. In fact, we learn – in big fonts – that
Statistical significance is junk science, and its big piles of nonsense are spoiling the research of more than particle physicists.
Wow. It's remarkable because with this deep misunderstanding of the very key part of any rational thinking, this Gentleman can't possibly understand anything about the proper verification of theories in economics, his field, either. I would argue that because of this lethal flaw in the author's approach to rational reasoning, it is guaranteed at 5 sigma that your humble correspondent and many other physicists and scientists simply have to be better economists than Mr Ziliak, too. He just can't have a clue about the scientific approach to anything.

Statistical significance is absolutely paramount in the verification of hypotheses in all natural sciences as well as all social sciences that more or less successfully try to emulate the scientific character and success of the natural sciences.

Only in mathematics, we may construct rigorous proofs that don't need to mention any probabilities because in principle, the probability that a mathematical proof is right may be verified to be 100 percent. There's no noise and no uncertainty in a rigorous mathematical proof.

However, this "optimistic observation" has two major limitations. One of them is that mathematics doesn't directly apply to the real world. As long as mathematical concepts, theorems, and their proofs are considered rigorous, they can't be reliably and accurately identified with anything in the real world. So they tell us nothing about Nature, humans, or the society. Claims about Nature, humans, or the society simply don't belong to mathematics. They can't be absolutely certain. They can't be rigorous in the truly mathematical sense.

The second limitation is that people aren't infallible so for various reasons, even a mathematical proof has a nonzero probability to be wrong. Even if a proof is carefully verified etc., there's always a nonzero probability that the brain or the computer performed an invalid operation that led to the confirmation of a proof that is actually erroneous. The embedding of mathematicians' brains in Nature guarantees that these brains can't quite share the perfectly clean, infallible features of the idealized world of mathematics.

In natural sciences, the verification and falsification of hypotheses – and falsification in particular is the basic methodology that makes observations relevant (and observations have to be relevant for anything that we call science) – always involves measurements that have some uncertainty, a nonzero error margin, or a risk that a phenomenon is caused by different causes than those we want to search for. This is a fact: the world is simply messy and complicated. It is partly unpredictable. It is not a clean and transparent celestial sphere with perfectly spherical angels.

We may develop mathematical models and theories that are meant to match the observations and they may be free of any remarks about error margins, backgrounds, or false positives. But as soon as we do anything that remotely involves the theories' verification – and in sciences, the verification ultimately boils down to empirical verification – we simply have to acknowledge that each measured quantity has a nonzero error margin because it can't be measured quite accurately. We must acknowledge that an event that looks like a proof of some new phenomenon predicted by a theory was actually caused by a more mundane – while perhaps more rare and less likely – effect that combines the known mechanisms.

We must not only acknowledge it but we must also quantify all these things. We must know whether the error margin of a measurement is small enough so that the measurement is useful and trustworthy concerning the validity of a proposition. In the same way, we must know whether it's conceivable that the event apparently proving a new effect is actually caused by a combination of an older, less extraordinary theory combined with some reasonable amount of good luck.

For all these things, we have to quantify the probabilities.

The Higgs boson was officially discovered once the probability that the pairs of photons or Z-bosons with the right energies that really look like coming from a new, 125-126 GeV heavy particle, were so numerous that such a spike in the number of these events was very unlikely to appear without a new particle. By "very unlikely", particle physicists mean the chance "1 in 3 million", also known as "5 sigma", that the excess was a fluke that appeared in a world without a new particle.

Some disciplines of science try to be as hard and reliable as particle physics so they adopted the same 5-sigma (1 in 3 million) standard for discovery; most other disciplines, especially soft sciences such as medical research, climate science, psychology, and others, are often satisfied with 3-sigma (1 in 300) or even 2-sigma (1 in 20) evidence.

The number of sigmas determine the deviations from the null hypothesis. A null hypothesis is some simple enough explanation "without new players" that admits some controllable noise according to some calculable statistical treatment. If it predicts that a quantity \(X\) has the value \(X_0\pm \Delta X\) where the distribution is normal (and it is very often almost exactly normal, and even if it is not normal, we usually know what it looks like and we can calculate the probabilities for other distributions as well), i.e. \(C\times \exp[-(X-X_0)^2/2\Delta X^2]\) where \(C\) is chosen so that the "total probability of any possibility" equals one, then it is possible to calculate that the probability that \(X\) doesn't belong to the interval \((X_0-5\Delta,X_0+5\Delta X)\) is approximately 1 over 3 million which is so tiny that physicists are willing to take the risk and announce the discovery.

The total significance of the deviation from the Higgs-less null hypothesis is now around 10 sigma or so which makes us really sure that the Higgs-like excess isn't just a fluke. The probability that the excess is just a fluke – a collection of coincidences – is much smaller than 1 in a quadrillion. These numbers are so large because \(\exp(-x^2)\) decreases really quickly with \(x\), more quickly than exponentially, in fact.

When the discrepancy between a theory and the observation becomes this high, we may eliminate the null hypothesis (in this case, a crippled Standard Model where the Higgs is amputated). This is the process of falsification and it's the key empirically rooted procedure by which any science makes some progress in its ability to distinguish viable hypotheses from the disproved ones. To disprove a (null) hypothesis is this straightforward. On the other hand, we can never "quite prove" any detailed theory because there's always a possibility (and, with an exception of the truly final theory, pretty much certainty) that more extensive and accurate experiments in the future will falsify the latest best theory, too. Equivalently, the absence of a statistically significant (e.g. 2-sigma or 5-sigma) deviation in the latest data doesn't mean that the null hypothesis is right and will be right forever. It just means that the deviations as displayed in the performed experiments are smaller than a certain bound which implies that the current theory is "practically" correct. In the future, a discrepancy may be found in more accurate, refined, or extensive experiments that may see tinier or subtler effects than what we can see today.

One simply can't ever deduce any conclusions from the empirical data with absolute certainty. It's always important to acknowledge that an uncertainty is there. And because such an uncertainty may compromise the conclusions, it's always important (sometimes more important, sometimes less important, but never quite forgettable) to quantify the uncertainty, i.e. to know how large it is. The most invariant way of quantification is ultimately one in terms of the probability that a conclusion is invalid because an anomalous observation or a "smoking gun" wasn't really caused by the new effect whose existence we wanted to prove but rather by some good luck (or bad luck) – an amount of luck that can't be quite small (because, as we assume, the observation doesn't look like the most typical prediction of the null hypothesis) but it can't be too large (because it may still realistically happen).

All this methodology is absolutely essential for any controlled, reliable enough empirical tests of any theory or any hypothesis in any natural or social science. We may only discuss how high our certainty should be for us to authoritatively claim that our experiments or observations have established something (the requirements may depend on the context a little bit). 5-sigma is the usual standard of the hardest sciences (led by particle physics) for the discovery. It wouldn't hurt if other sciences adopted the same standards. When a dataset produces 2-sigma excesses, which still has a substantial, "1 in 20" risk of a false positive, you only need a 2.5 squared i.e. 6.25 times larger dataset to achieve a 5-sigma excess where the risk of a false positive is just "1 in 3 million". I am confident that science would be much clearer if surveys with mere 2-sigma excesses were summarized as inconclusive ones. Lots of bad and questionable results in soft sciences are caused by their low standards on how many sigmas we need. These bad apples have far-reaching consequences because many other papers try to build on these bad apples, and so on.

But if someone wants to abandon the null hypothesis testing and the notion of statistical significance in general, he is surely throwing out the baby with the bath water. He can't possibly understand how proper science is done; he couldn't have possibly done any empirical research that could be uncontroversially considered scientific. In fact, as we have often emphasized on this blog, all predictions of fundamental theories of physics ultimately have to be probabilistic (even if you remove all the technological limitations of measurement devices etc.) because quantum mechanical postulates have to be universally valid in the whole Universe and every small or large corner of it.

Mr Ziliak tries to excuse his silly remarks by some confusing assertions about the nature of particle physicists' claims about the Higgs boson. The 5-sigma excess doesn't prove the Higgs boson, he says: it could be a Prometheus particle, too. But if he's serious, he misunderstands what terminology means in physics – and science. You are free to use the name "Prometheus" for the Higgs boson; after all, many of us use many other names at various points, such as the God particle or the BEH boson (only Peter Higgs really noticed the extra bosonic excitation named after him). But while the people are free to choose their language and terminology, physics isn't about terminology. Physics is about the observable phenomena. So even if the source of the bump were Prometheus according to your terminology and your belief system, it's still empirically demonstrated that this Prometheus behaves as the Higgs boson. If it looks like a God particle, walks like a God particle, and barks like a Dog particle, then it is a God particle (if you change one Dog to God). It doesn't matter whether someone says it's a Prometheus, too.

At the beginning, the new particle was given uncertain names and it was Higgs-like because there was clearly a new particle-like effect and its properties were compatible with the properties of a Higgs boson. Later, as we were more certain and knew more accurate values of the properties, we became able to falsify the theory that the bump is caused by something that differs too much from the Standard Model Higgs boson. At this point, we have everything we need to call it the Standard Model Higgs boson. By this claim, we don't mean that the Standard Model will forever be the right and complete theory for all observations. It almost certainly won't be. But the observed properties of the Higgs boson falsify so many competing hypotheses and are so nontrivially close to the predictions of the Standard Model Higgs boson that there's no reason not to use this name for the object. So the new particle may be a Prometheus but according to the physical definition of "being a Higgs boson", it is clearly a Higgs boson, too. Physics determines whether something is a Higgs boson by its decays, rates of production, mass, and other interactions, and if those things agree with the Higgs boson's property, then the particle – whether it is God or Prometheus or anyone else – simply is a Higgs boson and attempts to claim otherwise are just artifacts of a distorted terminology, mistakes, and demagogy.

When I talked about the certainty that the LHC has observed a new particle; a new Higgs-like particle; or a Standard-Model-like Higgs boson (these phrases are increasingly accurate and increasingly strong), I only took the (almost) purely experimental data into account. Aside from these nearly direct observations, we have nearly rock-solid theoretical arguments – that I won't offer to Mr Ziliak because he isn't smart enough to understand them as even the very rudimentary concept of statistical significance is already too hard and abstract for him – that there has to be a Higgs boson with the mass or other properties that can't differ from the observed ones by more than a relatively small amount. The Standard Model (or any theory with particles including the W- and Z-bosons and others we have known for 30 years) would simply produce inconsistent predictions (such as probabilities of some high-energy collisions exceeding 100 percent) if the Higgs boson weren't there. While an experimenter may view all these arguments as biases and he should perhaps only build on what he has seen with his own eyes, other physicists are more than free (in fact, nearly obliged) to use all the available evidence to decide about the existence of the Higgs boson (as well as any other scientific question). With this additional, mathematically sophisticated evidence added to the mix, there's really no doubt that Nature contains a Standard-Model-like Higgs boson. There's no sensible doubt about millions of other scientific claims, either. But the probability that these insights are right is never quite 100 percent although it has gotten insanely close to 100 percent in very many cases.

Add to Digg this Add to reddit

snail feedback (15) :

reader Gene Day said...

Without a thorough understanding of statistics it would be impossible to develop new pharmaceuticals. Without pharmaceuticals I would be dead. He denies statistical significance! My God!

reader lukelea said...

Since economists rarely include error bars isn't this evidence that it is not much of a science? I've been mightily impressed by Morgenstern's book on the much neglected measurement problem in economics. Even something as simple as "the" price of an article is difficult to determine and may not even exist. Which is why I think the employment of calculus in economics is spurious: there are no functions, let alone continuous ones, let alone continuous ones which we can measure. I would go further and say that mathematical equations (using the = sign) also have no place in economics except as a heuristic device (Eg. quantity theory of money). The only thing you are left with is the law of diminishing returns in its various manifestations, which is about the shape (convexity, concavity) of certain curves -- oops, not curves, there are no lines, only fuzzy lines whose fuzziness is not based on a normal distribution.

Does this mean that economics is useless? Not at all. You can squeeze a lot out of that little that you have. Adam Smith showed the tendency towards general equilibrium in a free market economy, later refined by the marginal revolution. Also you can make certain predictions involving these two signs: < and >. Just not =

At least this is my considered view of the subject, which I happen to love.

reader Orpheus said...

What about prior probabilities? By Bayes' Theorem, if someone assigns a low enough prior to e.g. the existence of the Higgs boson, that person may still obtain a less than 50% probability that it exists even after a succesful 5 sigma test.

One might of course argue that it's implausible to arbitrarily assign extremely low probabilities to scientific hypotheses, but "intuitive plausibility" does not seem to be a very rigorous framework to estimate prior probabilities. Is there some way around this problem or am I seeing things incorrectly?

reader Norpag said...

Lubos You say
" In fact, as we have often emphasized on this blog, all predictions of fundamental theories of physics ultimately have to be probabilistic (even if you remove all the technological limitations of measurement devices etc.) because quantum mechanical postulates have to be universally valid in the whole Universe and every small or large corner of it." and "Statistical significance is absolutely paramount in the verification of hypotheses in all natural sciences as well as all social sciences that more or less successfully try to emulate the scientific character and success of the natural sciences."
I think science should abandon the idea of laws being "valid " which is a human construct which gives an illusion of certainty which most people need.The important thing about laws is simply are they useful. This is the root cause of the division between classical physics at one end and quantum mechanics at the other. Its really a question of complexity Classical physics works by simplifying idealising and isolating systems - thus Newton - Einstein gravity works well enough with small masses eg the solar sysem but does very poorly at the Galactic scale. Similarly a statistical stochastic approach works well for particle physics and quantum mechanical processes at the other end.In between at an intermediate level of complexity neither approach works too well. When studying systems eg climate science and cosmology which consist of multiple resonating oscillatory interacting variables probably which have a secular evolution a different approach is requiered.Such systems are usually inherently untestable and outcomes can only be forecast for relatively small periods of future time by the recognition of patterns (wavelet analysis) that repeat for some periods of time on a scale of interest to humanity.In other words the " validity" of "Laws" in this area is inherently unknowable and is a meaningless concept. This is why Einstein couldn't come up with a UFT- nature isn't designed in in such a way that such a concept is meaningful

reader SteveBrooklineMA said...

Thanks Lubos, I have been hoping you would comment on this since reading WM Briggs' praise for Ziliak. I think there is plenty of bad science backed up with bad statistics, but these guys are not helping.

reader lukelea said...

If the chance of a two sigma event is 1 out of 20 and a three sigma event is 1 out of 300, how does a 5 sigma event get all the way up to 1 out of 3,000,000?

To betray my ignorance, in IQ studies the average is 100 and the standard deviation is 15. Roughly one out of seven people have an IQ of 115 or higher, and one out of fourty-nine 130 or higher. Continuing that trend I was under the impression that you keep multiplying by 1/7 to get the chances for each higher sigma: approximately one out of 350 for 3 sigma, one out of 2500 for four sigma, and one out of 17,500 for 5 sigma, corresponding to an IQ of 175, which is already pretty meaningless or so I gather from something you once said because the tests aren't really that good.

I feel really dumb asking this question but would like to clear up my confusion.

reader Luboš Motl said...

Dear Luke, I wrote the explanation but I understand that in order to explain such things, one would have to pedagogically spend about 30 times longer time.

The odds become so extreme with the number of sigmas so quickly because the probability is the integral of exp(-x^2) which is faster than exponential.

Please think about it or try to study some standardized page/introduction to it, like

reader lukelea said...

Thanks. I'll study up.

reader Peter Golian said...

There is always " some " noisy factor in nature and therefore is statistics important. As I say there are no exactly two same things in universe. According to my lecture from basic statistics but other area as quantum physics, match cutting machine ,I know that there are no same matches with 30 mm exactly as set up of machine, there is always some difference due to machine. But if there is 28- 32 mm it is OK due to quality (match box properties, burning time). If I interprete it to quantum physics this noise could be due to test equipment performance ... ... But question is how many sigma are adequate to proof theory? This I think in future math should be deal with 8)

reader BobSykes said...

Briggs and many other Bayesians reject the frequentist approach in its entirety and regard confidence intervals and p values to be patent absurdities. It's not clear how they would apply Bayes Theorem to the the LHC results. It's also not clear whether Bayesians do not become effective frequentists once the results are in. They seem to be pure Bayesians only for the prior probability.

reader Luboš Motl said...

This justification sounds really crazy because in the normal formulation of the "Bayesian vs frequentist controversy", it's the Bayesians who should accept the notion of probability in a wider, more inclusive set of situations, in particular, they include various really subjective notions of probability.

I can't imagine how someone could reject the frequentist interpretation of probabilities, because it's the more indisputable one.

reader AngularMan said...

In reality even mathematical proofs are not noise-free, since they depend on the verification by humans or computers, and there is always a (very tiny) chance of error in those verifications. They just appear rigorous because the significance is so high.

reader Luboš Motl said...

Exactly but I actually wrote this thing in the text above, too. ;-D

reader AngularMan said...

I was eager to express that thought, and so I wrote the comment before reading the rest of the article.

Sorry :D

reader Albert Zotkin said...

How often does this universe repeat? What is the probability that a universe like this one existed? We can define the properties of this universe as the physical laws and constants that define its symmetries, but we can't test the number of repetitions this universe occurs over the whole number of equivalent repetitions of the probable universes. Therefore, string cosmology or string inflationary models are not statistically testable, so they must be tagged as pseudo-science ;-)