Mel B. has sent me a link pointing to a rather incredible attack by an economics professor on the statistical methods in science that was published in the Financial Post:
Statistical significance is junk science, and its big piles of nonsense are spoiling the research of more than particle physicists.Wow. It's remarkable because with this deep misunderstanding of the very key part of any rational thinking, this Gentleman can't possibly understand anything about the proper verification of theories in economics, his field, either. I would argue that because of this lethal flaw in the author's approach to rational reasoning, it is guaranteed at 5 sigma that your humble correspondent and many other physicists and scientists simply have to be better economists than Mr Ziliak, too. He just can't have a clue about the scientific approach to anything.
Statistical significance is absolutely paramount in the verification of hypotheses in all natural sciences as well as all social sciences that more or less successfully try to emulate the scientific character and success of the natural sciences.
Only in mathematics, we may construct rigorous proofs that don't need to mention any probabilities because in principle, the probability that a mathematical proof is right may be verified to be 100 percent. There's no noise and no uncertainty in a rigorous mathematical proof.
However, this "optimistic observation" has two major limitations. One of them is that mathematics doesn't directly apply to the real world. As long as mathematical concepts, theorems, and their proofs are considered rigorous, they can't be reliably and accurately identified with anything in the real world. So they tell us nothing about Nature, humans, or the society. Claims about Nature, humans, or the society simply don't belong to mathematics. They can't be absolutely certain. They can't be rigorous in the truly mathematical sense.
The second limitation is that people aren't infallible so for various reasons, even a mathematical proof has a nonzero probability to be wrong. Even if a proof is carefully verified etc., there's always a nonzero probability that the brain or the computer performed an invalid operation that led to the confirmation of a proof that is actually erroneous. The embedding of mathematicians' brains in Nature guarantees that these brains can't quite share the perfectly clean, infallible features of the idealized world of mathematics.
In natural sciences, the verification and falsification of hypotheses – and falsification in particular is the basic methodology that makes observations relevant (and observations have to be relevant for anything that we call science) – always involves measurements that have some uncertainty, a nonzero error margin, or a risk that a phenomenon is caused by different causes than those we want to search for. This is a fact: the world is simply messy and complicated. It is partly unpredictable. It is not a clean and transparent celestial sphere with perfectly spherical angels.
We may develop mathematical models and theories that are meant to match the observations and they may be free of any remarks about error margins, backgrounds, or false positives. But as soon as we do anything that remotely involves the theories' verification – and in sciences, the verification ultimately boils down to empirical verification – we simply have to acknowledge that each measured quantity has a nonzero error margin because it can't be measured quite accurately. We must acknowledge that an event that looks like a proof of some new phenomenon predicted by a theory was actually caused by a more mundane – while perhaps more rare and less likely – effect that combines the known mechanisms.
We must not only acknowledge it but we must also quantify all these things. We must know whether the error margin of a measurement is small enough so that the measurement is useful and trustworthy concerning the validity of a proposition. In the same way, we must know whether it's conceivable that the event apparently proving a new effect is actually caused by a combination of an older, less extraordinary theory combined with some reasonable amount of good luck.
For all these things, we have to quantify the probabilities.
The Higgs boson was officially discovered once the probability that the pairs of photons or Z-bosons with the right energies that really look like coming from a new, 125-126 GeV heavy particle, were so numerous that such a spike in the number of these events was very unlikely to appear without a new particle. By "very unlikely", particle physicists mean the chance "1 in 3 million", also known as "5 sigma", that the excess was a fluke that appeared in a world without a new particle.
Some disciplines of science try to be as hard and reliable as particle physics so they adopted the same 5-sigma (1 in 3 million) standard for discovery; most other disciplines, especially soft sciences such as medical research, climate science, psychology, and others, are often satisfied with 3-sigma (1 in 300) or even 2-sigma (1 in 20) evidence.
The number of sigmas determine the deviations from the null hypothesis. A null hypothesis is some simple enough explanation "without new players" that admits some controllable noise according to some calculable statistical treatment. If it predicts that a quantity \(X\) has the value \(X_0\pm \Delta X\) where the distribution is normal (and it is very often almost exactly normal, and even if it is not normal, we usually know what it looks like and we can calculate the probabilities for other distributions as well), i.e. \(C\times \exp[-(X-X_0)^2/2\Delta X^2]\) where \(C\) is chosen so that the "total probability of any possibility" equals one, then it is possible to calculate that the probability that \(X\) doesn't belong to the interval \((X_0-5\Delta,X_0+5\Delta X)\) is approximately 1 over 3 million which is so tiny that physicists are willing to take the risk and announce the discovery.
The total significance of the deviation from the Higgs-less null hypothesis is now around 10 sigma or so which makes us really sure that the Higgs-like excess isn't just a fluke. The probability that the excess is just a fluke – a collection of coincidences – is much smaller than 1 in a quadrillion. These numbers are so large because \(\exp(-x^2)\) decreases really quickly with \(x\), more quickly than exponentially, in fact.
When the discrepancy between a theory and the observation becomes this high, we may eliminate the null hypothesis (in this case, a crippled Standard Model where the Higgs is amputated). This is the process of falsification and it's the key empirically rooted procedure by which any science makes some progress in its ability to distinguish viable hypotheses from the disproved ones. To disprove a (null) hypothesis is this straightforward. On the other hand, we can never "quite prove" any detailed theory because there's always a possibility (and, with an exception of the truly final theory, pretty much certainty) that more extensive and accurate experiments in the future will falsify the latest best theory, too. Equivalently, the absence of a statistically significant (e.g. 2-sigma or 5-sigma) deviation in the latest data doesn't mean that the null hypothesis is right and will be right forever. It just means that the deviations as displayed in the performed experiments are smaller than a certain bound which implies that the current theory is "practically" correct. In the future, a discrepancy may be found in more accurate, refined, or extensive experiments that may see tinier or subtler effects than what we can see today.
One simply can't ever deduce any conclusions from the empirical data with absolute certainty. It's always important to acknowledge that an uncertainty is there. And because such an uncertainty may compromise the conclusions, it's always important (sometimes more important, sometimes less important, but never quite forgettable) to quantify the uncertainty, i.e. to know how large it is. The most invariant way of quantification is ultimately one in terms of the probability that a conclusion is invalid because an anomalous observation or a "smoking gun" wasn't really caused by the new effect whose existence we wanted to prove but rather by some good luck (or bad luck) – an amount of luck that can't be quite small (because, as we assume, the observation doesn't look like the most typical prediction of the null hypothesis) but it can't be too large (because it may still realistically happen).
All this methodology is absolutely essential for any controlled, reliable enough empirical tests of any theory or any hypothesis in any natural or social science. We may only discuss how high our certainty should be for us to authoritatively claim that our experiments or observations have established something (the requirements may depend on the context a little bit). 5-sigma is the usual standard of the hardest sciences (led by particle physics) for the discovery. It wouldn't hurt if other sciences adopted the same standards. When a dataset produces 2-sigma excesses, which still has a substantial, "1 in 20" risk of a false positive, you only need a 2.5 squared i.e. 6.25 times larger dataset to achieve a 5-sigma excess where the risk of a false positive is just "1 in 3 million". I am confident that science would be much clearer if surveys with mere 2-sigma excesses were summarized as inconclusive ones. Lots of bad and questionable results in soft sciences are caused by their low standards on how many sigmas we need. These bad apples have far-reaching consequences because many other papers try to build on these bad apples, and so on.
But if someone wants to abandon the null hypothesis testing and the notion of statistical significance in general, he is surely throwing out the baby with the bath water. He can't possibly understand how proper science is done; he couldn't have possibly done any empirical research that could be uncontroversially considered scientific. In fact, as we have often emphasized on this blog, all predictions of fundamental theories of physics ultimately have to be probabilistic (even if you remove all the technological limitations of measurement devices etc.) because quantum mechanical postulates have to be universally valid in the whole Universe and every small or large corner of it.
Mr Ziliak tries to excuse his silly remarks by some confusing assertions about the nature of particle physicists' claims about the Higgs boson. The 5-sigma excess doesn't prove the Higgs boson, he says: it could be a Prometheus particle, too. But if he's serious, he misunderstands what terminology means in physics – and science. You are free to use the name "Prometheus" for the Higgs boson; after all, many of us use many other names at various points, such as the God particle or the BEH boson (only Peter Higgs really noticed the extra bosonic excitation named after him). But while the people are free to choose their language and terminology, physics isn't about terminology. Physics is about the observable phenomena. So even if the source of the bump were Prometheus according to your terminology and your belief system, it's still empirically demonstrated that this Prometheus behaves as the Higgs boson. If it looks like a God particle, walks like a God particle, and barks like a Dog particle, then it is a God particle (if you change one Dog to God). It doesn't matter whether someone says it's a Prometheus, too.
At the beginning, the new particle was given uncertain names and it was Higgs-like because there was clearly a new particle-like effect and its properties were compatible with the properties of a Higgs boson. Later, as we were more certain and knew more accurate values of the properties, we became able to falsify the theory that the bump is caused by something that differs too much from the Standard Model Higgs boson. At this point, we have everything we need to call it the Standard Model Higgs boson. By this claim, we don't mean that the Standard Model will forever be the right and complete theory for all observations. It almost certainly won't be. But the observed properties of the Higgs boson falsify so many competing hypotheses and are so nontrivially close to the predictions of the Standard Model Higgs boson that there's no reason not to use this name for the object. So the new particle may be a Prometheus but according to the physical definition of "being a Higgs boson", it is clearly a Higgs boson, too. Physics determines whether something is a Higgs boson by its decays, rates of production, mass, and other interactions, and if those things agree with the Higgs boson's property, then the particle – whether it is God or Prometheus or anyone else – simply is a Higgs boson and attempts to claim otherwise are just artifacts of a distorted terminology, mistakes, and demagogy.
When I talked about the certainty that the LHC has observed a new particle; a new Higgs-like particle; or a Standard-Model-like Higgs boson (these phrases are increasingly accurate and increasingly strong), I only took the (almost) purely experimental data into account. Aside from these nearly direct observations, we have nearly rock-solid theoretical arguments – that I won't offer to Mr Ziliak because he isn't smart enough to understand them as even the very rudimentary concept of statistical significance is already too hard and abstract for him – that there has to be a Higgs boson with the mass or other properties that can't differ from the observed ones by more than a relatively small amount. The Standard Model (or any theory with particles including the W- and Z-bosons and others we have known for 30 years) would simply produce inconsistent predictions (such as probabilities of some high-energy collisions exceeding 100 percent) if the Higgs boson weren't there. While an experimenter may view all these arguments as biases and he should perhaps only build on what he has seen with his own eyes, other physicists are more than free (in fact, nearly obliged) to use all the available evidence to decide about the existence of the Higgs boson (as well as any other scientific question). With this additional, mathematically sophisticated evidence added to the mix, there's really no doubt that Nature contains a Standard-Model-like Higgs boson. There's no sensible doubt about millions of other scientific claims, either. But the probability that these insights are right is never quite 100 percent although it has gotten insanely close to 100 percent in very many cases.