**A textbook pedagogic exercise on chi-squared distributions**

On the 2012 Independence Day, the world of physics has made a transition. The question "Is there a Higgs-like particle with mass near \(126\GeV\)?" has been answered.

*Weinberg's toilet, i.e. Glashow's model of the SM Higgs boson. Explanations below.*

The ATLAS Collaboration gave a 5-sigma "Yes" answer; via the rules of the normal distribution, this translates to the risk "1 in a million" that the finding is noise i.e. that it is a false positive.

Even CMS, despite its inability to use the correct Comic Sans fonts, has been able to do the same thing: 5 sigma discovery, "1 in a million" certainty. If you combine the two independent experiments, the risk of a false positive multiplies: the LHC says that there is a 7-sigma (via the Pythagorean theorem), or "1 in a trillion" (via a product), risk of a false positive. Compare it with 1-in-10 or 1-in-3 or 9-in-10 risk of a false positive that is tolerable in the climate "science" or various medical "sciences".

Once this question is answered, the new question is: What is the right theory that describes the precise behavior of the new particle?

There is one popular candidate: the so-called Standard Model. Steven Weinberg gave it its name and he also constructed much of it. Besides old and well-known things, it also includes one Higgs doublet which gives rise to one physical Higgs polarization.

Various trained and active physicists have worshiped the God particle by rituals that only physicists have mastered so well. Stephen Wolfram called it a hack and this word was actually used by some currently active physicists, too. High school and Nobel trip pal of Weinberg, Shelly Glashow, called it the Weinberg toilet. It's something your apartment needs but you're not necessarily too proud about it.

The Standard Model is popular because it's minimal in a certain sense that is easily comprehensible for humans: it contains a minimum number of such toilets. Whether it's so great for Nature or for a landlord or for a crowded aircraft serving strange combinations of food for lunch to have the minimum possible (consistent) number of toilets remains to be seen; minimizing the number or the sophistication of toilets actually isn't necessarily Nature's #1 priority. Now, however, we may already give an answer to the question:

**How much are the 2011-2012 LHC data compatible with the minimum solution, the Standard Model?**

To answer the question, we must look at somewhat more detailed properties of the new particle and apply a statistical procedure. As an undergraduate freshman, I didn't consider fancy statistics to be a full-fledged part of physics. Can't you measure things accurately and leave the discussion of errors to the sloppy people who make errors? However, the good enough experimenters in Prague forced me to learn some basic statistics and I learned much more of it later. Statistics is needed because even physicists who love accuracy and precision inevitably get results that are incomplete or inaccurate for various reasons.

The text will be getting increasing technical as we continue. Please feel free to stop and escape this blog with a horrified face. Of course, I will try to make it as comprehensible as possible.

**Collecting the data**

We could consider very detailed properties of the new particle but instead, we will only look at five "channels" – five possible recipes for final products into which the Higgs particle may decay (not caring about some detailed properties of the final products except their belonging to one of the five types). We will simply count how many times the LHC has seen the final products of one of the five types that look just like if they come from the Higgs.

*Karel Gott (*1939 Pilsen), a perennial star of the Czech pop-music, "Forever Young" (in German; Czech). The video is from Lu-CERN, Switzerland.*

In each of the five groups, some of them really come from the Higgs; others come from processes without any Higgs that are just able to fully emulate the Higgs. The latter collisions – called the "background" – are unavoidable and inseparable and they make the Higgs search or any other search in particle physics harder.

The LHC has recorded over a quadrillion (1,000 trillions or million billions) collisions. We will only look at the relevant ones. For each of the five groups below, the Standard Model predicts a certain number, plus minus a certain error, and the LHC (ATLAS plus CMS which we will combine, erasing the artificial separation of the collisions into two groups) has measured another number. The five groups are these processes:\[

\eq{

pp&\to h\to b\bar b \\

pp&\to h\to \gamma\gamma \\

pp&\to h\to \tau^+\tau^- \\

pp&\to h\to W^+ W^- \\

pp&\to h\to ZZ

}

\] You see that in all of them, two protons collide and create a Higgs particle, among other things. This particle quickly decays – either to a bottom quark-antiquark pair; or two photons; or two tau-leptons; or two W-bosons; or two Z-bosons.

Now open the freshly updated Phil Gibbs' viXra Higgs combo java applet (TRF) and play with the "Channel" buttons only, so that you don't damage this sensitive device. ;-)

If you're brave enough, switch the button from "Official" to "Custom" and make five readings. In each of them, you just check one of the boxes corresponding to the five channels above; \(Z\gamma\) yields no data and we omitted this "sixth channel". Then you look at the chart you get at the top.

Oh, I forgot, you must also switch "Plot Type" (the very first option) from "Exclusion" to "Signal" to make things clear. The graphs then show you whether you're close to the red, no-Higgs prediction or the green, yes-Higgs prediction.

In each of the five exercises, you move your mouse above the Higgs mass \(126\GeV\) or, if not possible, \(125\GeV\) which is close to the \(126\GeV\) that the LHC currently predicts (the average of ATLAS and CMS). A small rectangle shows up and you read the last entry, "xsigm". It tells you how much the combined LHC data agree with the yes-Higgs prediction of the Standard Model. "xsigm=0" means a perfect agreement, "xsigm=-1" means that the measured number of events is one standard deviation below the predictions, and so on. You got the point. Of course, you could read "xsigm" from the graph itself.

For the five processes, you get the following values of "xsigm":\[

\begin{array}{r|l}

\text{Decay products}& \text{xsigm}\\

\hline

b\bar b & +0.35 \\

\gamma\gamma& +2.52 \\

\tau^+\tau^-& -1.96 \\

W^+ W^- & -1.75 \\

Z^0 Z^0 & +0.32

\end{array}

\] I added the sign to indicate whether the observations showed an excess or a deficit; I am not sure why Phil decided to hide the signs. You see that only the \(\gamma\gamma\) i.e. diphoton channel has a deviation greater than two standard deviations. But there are two others that are close to two standard deviations, too: a deficit of ditau events and the W-boson pairs.

On the other hand, there are five channels which is a rather high number: it matches the number of fingers on a hand of a vanilla human. With a high number of entries, you expect some entries to deviate by more than one sigma by chance, don't you? How do we statistically determine how good the overall agreement is?

**We will perform the so-called chi-squared test**

Because both signs indicate a discrepancy between the Standard Model and the observations, we will square each value of "xsigm" and sum these squares. In other words, we will define\[

Q = \sum_{i=1}^5 {\rm xsigm}_i^2.

\] The squaring is natural for general mathematical reasons; it's even more natural if you realize that \({\rm xsigm}\) are actually quantities distributed along the normal distribution with the standard deviation of one and the normal distribution is Gaussian. It has \(-{\rm xsigm}^2/2\) in the exponent.

Fine, so how much do we get with our numbers?\[

\eq{

Q&=0.35^2+2.52^2+1.96^2+1.75^2+0.32^2\\

Q&= 13.48

}

\] The result is approximate. Yes, the signs cancelled here, anyway.

At this moment, we would like to know whether the \(Q\) score is too high or too low or just normal. First, it is clearly too high. It's because \(Q\) was a sum of five terms and each of them was expected to equal one in average; this "expectation value of \({\rm xsigm}^2\)" is what defines the standard deviation. So the average value of \(Q\) we should get is \(5\). We got \(13.48\).

How bad the disagreement is? What is the chance that we get \(13.48\) even though the expected average is just five?

**Looking at the \(\chi^2\) distribution**

To answer this question, we must know something about the statistical distribution for \(Q\). We know that the average is \(\langle Q\rangle=5\) but we need to know the probability that \(Q\) belongs to any interval. By definition of the \(\chi^2\)-distribution, the quantity \(Q\) defined as the sum of

*squares of*(thanks, JP) five normally distributed quantities (with a vanishing mean and with the unit standard deviation) is distributed according to the chi-squared distribution. If you prefer mathematical symbols,\[

Q\sim \chi_5^2.

\] The subscript "five" is the number of normally distributed quantities we're summing. This distribution is no longer normal. After all, the negative values are strictly prohibited, something that is unheard of in the world of normal distributions.

The probability density function (PDF) for this distribution – and the PDF is what we normally call "the distribution function" – may be explicitly calculated by composing the five normal distributions in a certain way and computing the resulting integral in spherical coordinates. The result is\[

\eq{

d\,Prob(Q\in(Q,Q+dQ))&= dQ\cdot \rho(Q)\\

\rho(Q) &= \frac{Q^{k/2-1}\exp(-Q/2)}{2^{k/2}\Gamma(k/2)}

}

\] The exponent in the exponential is just \(-Q/2\) without squaring because it comes from the Gaussians but \(Q\) already includes terms like \({\rm xsigm}^2\). The additional power of \(Q\) in the numerator arises from the integral over the four angular spherical coordinates in the 5-dimensional space. The remaining factors including the Gamma-function are needed to normalize \(\int\rho(Q)\dd Q=1\).

Now, the cumulative distribution function (CDF) – the probability that \(Q\) is found in the interval \((-\infty,Q)\) for some value of \(Q\) (hope you won't confuse the \(Q\)'s: there's no real potential for confusion here) is an integral of the PDF and it may be expressed in terms of some special function. Then you just substitute the numbers to get the probability.

However, such integrals may be calculated directly if you have Wolfram Mathematica 8. For example, if you ask the system to compute

Probability[q > 5., Distributed[q, ChiSquareDistribution[5]]]you will be told that the result is 41.588%, not far from the naive "50%" you would expect. The deviation arises because 5 is really the average and that's something else than the "median" which is the value that divides equally likely (50%) intervals of smaller and larger values.

Now, you just modify one number above and calculate

Probability[q > 13.48, Distributed[q, ChiSquareDistribution[5]]]The result is \(0.019\) i.e. \(1.9\%\). The probability is about 1-in-50 that the \(Q\) score distributed according to this \(\chi^2\) distribution is equal to \(13.48\) or it is larger. How should you think about it?

**Counting civilizations**

Imagine that the Standard Model is exactly correct. There are millions of planets in the Universe. Each of them has an LHC that has accumulated – if we count the combination from \(7\TeV\) as well as \(8\TeV\) collisions and from the year 2011 as well as 2012 after their own Jesus Christ – about 20 inverse femtobarns of high-energy proton-proton collisions.

There is an Internet on each planet, invented by their own Al Gore. If I have horrified you by the idea of a million of Al Gores, let me mention that there is also The Reference Frame on each planet that has processed the local LHC results and calculated \(Q\). However, the collisions were random so they also got a different number of events of various kinds and a different value of \(Q\).

The probability \(1.9\%\) means that \(1.9\%\) of the planets found \(Q\) that was at least \(13.48\). So if you believe that the Standard Model is correct, you must also believe that by chance, we belong among the top \(1.9\%\) of the planets that were just "unlucky" enough because their LHC measured a pretty large deviation from the Standard Model.

Is that shocking that by chance, we belong among the "top \(1.9\%\) offenders" within the planetary physics communities? This value of \(1.9\%\) is slightly greater than a 2-sigma signal. So you may say that the group of five channels we have looked at provides us with a 2-sigma evidence or somewhat stronger evidence that the Standard Model isn't quite right.

This is too weak a signal to make definitive conclusions, of course. It may strengthen in the future; it may weaken, too. After the number of collisions that is \(N\) times larger than those that have been used so far, the number "2" in this overall "2 sigma" will get multiplied roughly by \(\sqrt{N}\) if the deviation boils down to Nature's real disrespect for the exact laws of the Standard Model.

However, it will be mostly moving between 0 and 1.5 sigma in the future (according to the normal distribution: a higher spurious confidence level in sigmas would drop there following the function \(CL/\sqrt{N}\)) if the deviation from the Standard Model we observe and we quantified is due to noise and chance. It's too early to say but the answer will be getting increasingly clear in the near future. For example, if the signal is real, those overall 2 sigma may grow to up to 4 sigma after the end of the 2012 run.

Stay tuned.

Lubos, are you able to do the statistics better in the medical "sciences"? You want to do clinical studies on a 1 million population sample to get your 5-sigma accepted p-level? Who do you think will pay you for that? Nobody will. So we do exactly what you did with the CMS and ATLAS results. We combine the results of several smaller studies to get a stronger statistical evidence, it is called metaanalysis

ReplyDeleteThis is complete bullshit what you're writing, Mephisto.

ReplyDeleteIf you get a 2-sigma signal in a medical study, the sample needed for a 5-sigma signal assuming that the signal was actually real is just 2.5^2 = 6.25 times larger, not one million times larger.

It's enough to multiply the sample by six and you will switch from a 5% risk of a false positive (false discovery) to a 1-in-a-million risk of a false positive.

Who will fund this 6-times-larger samples? Anyone who funds science. Who is funding the smaller samples and scientists who are satisfied with 2-sigma signals? Sponsors of junk scientists, pseudoscientists, and hacks like you who are ignorant about basic science and basic statistics.

ReplyDeletea 1 million population sampleDo you mean a sample size of 1 million, or a sample from a population of 1 million? I hope you mean sample size, but it's kinda hard to explain why. :-)

I do not wish do spam you very interesting blog about the statistics of collider experiments. So feel free the remove the first as well as this second (and last comment of mine). But I think you underestimate it a little. In every university hospital, there are professional statisticians who are responsible for the results and I believe they know what they are doing. Every request for a research grant needs a calculation of a sample size needed to provide sufficient statistical power and this is done by statisticians. (google something like "Department of medical biostatistics", you find it at every university doing research)

ReplyDeleteThey may be departments and you may give them pompous names but they're still not doing a good job because we still don't know the answers to many common questions about the health impact of various things although lots of tests have been done.

ReplyDeleteIf Kolmogorov-Smirnov test is used, then one gets p-value = 0.43.

ReplyDeleteThanks for the expert physics commentary and statistics refresher. I had forgotten what chi-squared was XD

ReplyDeleteBelieve me those statisticians are remarkably clueless, and mostly blindly apply statistics. I have analysed one such study myself and found it to be flawed and told everyone.

ReplyDeleteResult: The statistician never replied to me, the concerned medical researchers replied to my arguments with "it's correct because it was done by a statistician.", the others didn't understand statistics or thought it is a minor point or that i should be more constructive...

I am convinced that 80-90% of all studies are wrong... and one of the reason is that those bad researcher train bad researchers train bad researchers.

Thanks for this nices reminder of the Chi-squared test Lumo, and I quite like its overall result for the actual deviation from the SM, using these five decay channels :-)

ReplyDeleteDear Lubos, could you please make the same analysis for 1st data-block presented in last press-release (2011) and compare it channel-by-channel with the current results ?

ReplyDeleteI would like to know by this calculations if there is a trend when comparing the old 7-Tev data with 7-Tev + 8-Tev data sets ... maybe it would help us to see a real preliminar deviation from SM or just a statistical fluctuation expected to die till 12/2012.

Thank you for doing this calculation.

ReplyDeleteAre you confident that there are no covariances between the different experiments? I don't know how the experiments work, but maybe whatever causes a deficit in one channel causes a surplus in other channels?

Dear Numcracker, you're invited to calculate it with the old numbers if you have them.

ReplyDeleteThe more complete data are clearly superior. You won't learn anything interesting by looking at a weaker dataset. The 2011 collisions are just a random subset of the 2011+2012 collisions.

The word "trend" is misleading because it suggests that one could extrapolate it. But this is complete nonsense, of course. Much of the current deviations are clearly noise and the proportion of noise was even higher after 2011, in a smaller dataset, and noise obviously can't be extrapolated via trends.

A repeatable negative correlation between CMS and ATLAS? Very exciting proposal. How would this miracle exactly occur? ;-)

ReplyDeleteI can imagine positive correlations. A part of the systematic errors are shared by the whole LHC. Most of the error here is statistical, I hope, and it is not correlated between CMS and ATLAS.

Some collisions are measured by the CMS device and methods, some by ATLAS, but this separation may be completely removed. Just think about it.

... but only CMS has a TD which can negatively influence their results ... :-P

ReplyDeleteIn, "the quantity Q defined as the sum of five normally distributed quantities", "squares of" is missing.

ReplyDeleteThanks, fixed, a small credit to you added.

ReplyDeleteDear Lubos, after reading your post one tends to believe new particles would be right to the corner, as proposed today in arXiv:1207.1445.

ReplyDeleteHowever, it may be that LHC's data analysis was made in a too naive way. Consider for instance arXiv:1207.1451 for QCD PDF corrections.

What is your opinion/bet on this issue ? Is this gamma-gamma signal real or just result from usual suspects ?

Dear NumCracker, the blog entry newer "by one" is about 1207.1445.

ReplyDeleteI am completely uncertain - literally 50 to 50 - on whether or not the deviations are compatible with the Standard Model and whether they have to go away after more complete data or more precise predictions.

If the deviations are real, it's rather hard to get new physics that is compatible with the suppressed diphoton channel etc., as argued in 1207.1445 and probably in other papers to emerge soon.