Friday, October 10, 2014

CMS, ATLAS metaexperiment: deficit of deficits

Important update: I have received a message from the authors and they confirmed, as I was somewhat afraid in the original blog post below, that the channels with exactly 0 events were mishandled. A newer version of the paper will appear on Monday October 13th, with just somewhat weaker results for one detector but stronger for the other, and thanking me. ;-) There may still be another bug but let's wait, they will find it if it is so.
A fewer poorer people kind of means that the society is rich, right? ;-)

Benjamin Nachman of SLAC and Tom Rodelius of Harvard published an amusing piece of comparative literature:
A Meta-analysis of the \(8\TeV\) ATLAS and CMS SUSY Searches
Able to produce papers like that, they should be named professors of comparative literature. In fact, they're better than the average professors of comparative literature because those usually don't know lognormal distributions and related concepts.

I count them into this field because one really doesn't need to know any physics (what the experiments measure, how they measure it, how it's being predicted what they should observe, and so on) – it's enough to read many papers and to know how to (statistically) compare their results.

The interval \((0,1)\) for \(p\) is divided to ten bins; the red bars appear on the left side from the blue bars just to make the diagram more readable.

Here is the quick summary. They've looked at 17 ATLAS preprints and 12 CMS preprints searching for signs of supersymmetry and based on the bulk of the 2012 collisions and statistically analyzed which of them showed excesses – more observed collisions of a special type than expected – and deficits – less observed collisions than expected.

While no particular paper sees "more than 3-sigma" excesses in a particular search, one may still ask whether collectively, the 29 papers see more excesses than expected by chance, or fewer excesses than expected by chance.

The barcharts above show the expected (red) and observed (blue) number of results with a certain \(p\)-value (which is on the \(x\)-axis). They have one more set of barcharts for deficits; this one is for excesses. The top and bottom rows are for two different distributions used in the prediction (they don't differ much, anyway); the left and right columns are ATLAS and CMS, respectively.

The basic idea is that there are always statistical fluctuations but if you look at many results, the distribution of the statistical fluctuations itself should converge to a certain limiting one (despite the fact that you are comparing apples and oranges). You should see excesses approximately as often as deficits, higher deficits and surpluses should be rarer than modest ones by some amount, and so on.

You may see that the red barcharty-graph is smooth and predictable; the blue one is noisy in all cases. The agreement on the left side of the graphs is OK enough. But in the last bin, if they did it right, the (red) expectations always seem to be vastly higher than the (blue) measured reality.

Note that the right side of each graph corresponds to \(p\to 1\), which means "an extremely lousy degree of confidence that the new effect exists" and in practice, it means measurements that should have shown (red) or have shown (blue) the largest deficits of events. Many experiments with deficits are predicted by almost none of them is seen!

They could have made a mistake with the expected statistical distributions (the blue and red curves look really, really different: I can imagine that the mistake was to completely overlook the bins with zero collisions which sometimes don't have any "crosses" in the experimental graphs! Didn't they neglect that the number of events is discontinuous, an integer, and a positive one?) but if they haven't, it's a piece of evidence that there is some "overall" surplus of interesting events (or that there was a mechanism in play that prevented the channels to end up with zeroes) – but it is a surplus in the channels that aren't strong enough to produce statistically significant excesses individually. ;-)

Well, a "surplus of excesses" would have probably been a stronger piece of evidence, but a "deficit of deficits" is qualitatively good enough, too. In fact, I am not quite sure whether the presence of a weak signal across the papers should manifest itself earlier as a "surplus of excesses" or as a "lack of deficits".

The authors urge the ATLAS, CMS teams to publish some extra statistical details in their papers so that metaanalyses similar to this one could become more accurate in the future.

Incidentally, another paper now claims that the (rumored) excess of some dark-matter-like radiation coming from dwarf galaxies isn't there.

No comments:

Post a Comment