Monday, February 02, 2015

It is both ethical and right for an experimenter to correct his mistakes

Interpretations of measurements are inevitably theory-dependent

ATLAS has measured some top-antitop asymmetry which was previously claimed to behave strangely by the Fermilab. ATLAS got zero – no anomalous effect – within the error margin.

Off-topic: an ex-co-author of mine, Robbert Dijkgraaf, kickstarted the 2015 International Year of Light by a fun 15-minute Amsterdam lecture. Hat tip: Clifford Johnson

Tommaso Dorigo of the competing CMS team didn't like the ATLAS' estimates of the error margins:
The ATLAS Top Production Asymmetry And One Thing I Do Not Like Of It.
The three most important points he is making are
  1. that it's a terrible sin for an experimenter to underestimate the error margin of his measurement
  2. to avoid this underestimate, he should actually try to estimate things as accurately as possible because some seemingly "error enhancing" or "conservative" choices may actually lower the final error margin
  3. it's dishonest for an experimenter to modify his methodology after he sees the results
I see the possible "ethical" justification of all these points but at the end, I am closer to disagreeing with two of them (1st and 3rd one). The dear reader is surely asking: Could you tell us some details?

We should know about the example. They are measuring an asymmetry, a dimensionless quantity \(A\) that belongs to the interval \([-1,+1]\) because it's the ratio of the difference between two counters and the sum of these counters:\[

A^{t\bar t}_C = \frac{ N(\Delta |\eta|\gt 0) - N(\Delta |\eta|\lt 0) }{ N(\Delta |\eta|\gt 0) + N(\Delta |\eta|\lt 0) }

\] Yes, the mathematics is shown in MathJax 2.5 final. The ratio (asymmetry) above applies to a top quark-antiquark pair, \(t\bar t\). A similar ratio \(A^{\ell\ell}_C\) exists for the lepton pairs.

Now, \(N(\dots)\) counts the number of collisions in which the condition \((\dots)\) is satisfied. And \(\eta\) is the pseudorapidity, the Minkowskian hyperbolic angle calculated from \(p^0\) and \(p^z\). In effect, the two different quantities \(N\) in the numerator and the denominator measure collisions in which the final top quark and/or antiquark moves "mostly" in the direction of the equally charged quark in the colliding proton (only the quark-antiquark annihilation contributes to the asymmetry; the gluon-gluon fusion inevitably yields a perfect symmetry) or (the second term) the collisions in which the equally charged quarks tend to flip their \(\eta\)-direction.

The Standard Model predicts both asymmetries to be something like \[

A = +0.010 \pm 0.005,

\] see the paper for details. The Tevatron liked to promote a significantly higher asymmetry while the ATLAS paper ends with numbers like\[

A = +0.02 \pm 0.02

\] which is compatible both with the small Standard Model prediction as well as with zero. So no new physics has been found. But the devil may be in the details. Dorigo's first specific complaint is directed against the sentences in the ATLAS paper
All systematic uncertainties are assumed to be 100% correlated, except...

The systematic uncertainties are treated as 100% correlated.
ATLAS is doing it in this way in order to be sufficiently certain that they won't underestimate their error margin because whenever they do underestimate the errors, they are likely to make a claim they are not certain about, hyping an accurate measurement even though they are not actually able to guarantee that good accuracy.

If a careful experimenter has two choices – either to say something that may be unsupported by his experiments; or to shut up – this careful experimenter will shut up. If I describe this practice in this way, I agree with that: it is the right "conservative" experimentalist approach to remain silent in the absence of reliable enough evidence. However, I disagree once an experimenter spuriously overstates the errors and starts to speak and deny the evidence that already exists.

Why does the calculation tend to increase the estimated error margin if you assume a perfect correlation between two quantities?

Well, first, imagine that you measure the quantity \(X+Y\) and both of the terms have errors, \(\Delta X\) and \(\Delta Y\). What is the error of \(X+Y\)? Well, the error is \(\Delta X+\Delta Y\) if the errors \(\Delta X\) and \(\Delta Y\) are systematic and correlated. In that case, there is no cancellation between the errors. The errors add up additively just like what the laymen or schoolkids usually assume.

However, if \(\Delta X\) and \(\Delta Y\) errors are independent statistical errors, or if they are systematic errors from different sources that make them uncorrelated, then the error of \(X+Y\) is just the Pythagorean hypotenuse \(\sqrt{(\Delta X)^2 + (\Delta Y)^2}\) which is smaller than \(\Delta X+\Delta Y\). It's smaller because the two errors \(\Delta X,\Delta Y\) are reasonably likely (about 50%) to have opposite signs and partly cancel.

The rule that you shouldn't underestimate your errors tells you that you should better assume that the error of \(X+Y\) is \(\Delta X+\Delta Y\), the larger one among the two, if you don't know what the correlation between \(\Delta X\) and \(\Delta Y\) is. So far so good. ATLAS and Dorigo agree.

Let's continue with something more complicated now. Instead of \(X+Y\), let's measure just one quantity, \(x\), in two different ways that are fundamentally expected to produce the same result. The two measurements yield \(x_1\) and \(x_2\) and you get the best estimate of the universal \(x\) if you compute some average of \(x_1\) and \(x_2\).

If \(x_1\) is measured much more accurately than \(x_2\), it is optimal to identify \(x=x_1\) and measure simply \(x_1\). The sentence with \(1\leftrightarrow 2\) holds, too. But if \(x_1,x_2\) are known equally accurately, it is best to define \(x=(x_1+x_2)/2\), and if the correlation between \(\Delta x_1\) and \(\Delta x_2\) is near zero, we will see that the error of \(x\) ends up being \(\sqrt{2}/2 = 1/\sqrt{2}\) times the error of each \(x_1\) or \(x_2\) separately: we have reduced the error \(1.41\) times. And that's a good thing.

For two general \(\sigma_1\) and \(\sigma_2\) errors of \(x_1,x_2\), the optimum weighted average is\[

x = \frac{{\large \frac{1}{\sigma_1^2} }x_1 + {\large \frac{1}{\sigma_2^2}} x_2}{ {\large \frac{1}{\sigma_1^2}} + {\large \frac{1}{\sigma_2^2}} }

\] The smaller the error \(\sigma_i\) is, the greater weight for \(x_i\) we choose. The sum of the two weights is equal to one. The inverse squares of the error margins have to occur there. I think that I've written a blog post in the past that explained why it is so.

OK, but this is only good if the errors \(\Delta x_1\) and \(\Delta x_2\) are uncorrelated. Note that the formula above contains a sum of things like \(1/\sigma_1^2+1/\sigma_2^2\) but there are no mixed terms such as \(1/\sigma_1\sigma_2\). Effectively, we have assumed a \(2\times 2\) matrix to be diagonal and this assumption is unlikely to be accurately true.

What is the right generalization of the formula for the weights etc. above in the case of a nonzero correlation? We need to consider the covariance matrix\[

V = \pmatrix{ \sigma_1^2& \rho \sigma_1\sigma_2 \\
\rho \sigma_1\sigma_2 &\sigma_2^2}

\] where \(\rho\in [-1,+1]\) is the correlation coefficient. The optimum weights \(w_1,w_2=1-w_1\) in\[

\hat x = w_1 x_1+w_2 x_2

\] may be written as\[

w_i = \frac{\sigma_i^2 - \rho \sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2-2\rho\sigma_1\sigma_2}

\] and if you calculate the inverse matrix \(V^{-1}\) and play with \(\chi^2\), you may determine that the error of the weighted average \(\hat x\) obeys\[

\frac{1}{\sigma^2} &= \frac{1}{1-\rho^2} \zav{ \frac{1}{\sigma_1^2} +\frac{1}{\sigma_2^2} -\frac{2\rho}{\sigma_1\sigma_2}}=\\
&= \frac{1}{\sigma_1^2} + \frac{1}{1-\rho^2}\zav{ \frac{\rho}{\sigma_1} - \frac{1}{\sigma_2} }^2

\] Fine. The mathematics got more complicated, as expected from promoting one number to a \(2\times 2\) matrix demanding various extra parameters such as \(\rho\) for the correlation. The first form of the result makes the \(1\leftrightarrow 2\) symmetry manifest.

The second, final form of \(\sigma\) is asymmetric and has a puzzling feature. The error margin \(\sigma\) actually decreases (because \(1/\sigma^2\) increases) if \(\rho\) increases and if \(\rho \gt \sigma_1/\sigma_2\) while \(\sigma_1,\sigma_2\) are kept fixed. It is puzzling because at the beginning, I suggested that a higher correlation \(\rho\) between the errors makes the total error "more systematic" so it should be close to the addition and the error should therefore be greater than one closer to the Pythagorean hypotenuse, something we get for a smaller \(\rho\).

But here, some extra increase in the correlation coefficient \(\rho\) actually reduces the total error. If you look at other interesting things occuring for \(\rho\gt \sigma_1/\sigma_2\), you will see that this inequality is equivalent to \(w_2\lt 0\): one of the weights becomes negative. Under certain circumstances (which didn't happen in the ATLAS paper but they could materialize in other experimental research), the weights are unexpectedly negative!

And it shouldn't be surprising that for negative weights, our usual intuition about "which formula for the error margin ends up leading to a greater value" may easily fail. Mathematics shows very clearly when it fails, why it fails, and how it fails.

Because this general discussion isn't included in the ATLAS paper – they didn't need it – you may ask whether ATLAS is aware of the possibility that the weights may go negative if the correlation coefficient is too high and if the two separate error margins are very different in magnitude. I think that many ATLAS members know this subtlety as well as Dorigo or I do. And most of the others will understand this conclusion when they read this blog post or Dorigo's one, for that matter.

However, an important point is that they really didn't need to know about this subtlety. And if \(\rho\) isn't too high and/or if \(\sigma_1,\sigma_2\) are closer to one another, the usual intuition does work and the assumption of the "perfect correlation" produces the safest i.e. "most conservative" i.e. largest estimates for the error margin \(\sigma\).

By the way, the value of \(\rho\) is also "measured", in some sense, although it is not the most important thing that the experimenters want to know. While the experimenters may work with very complicated statistical distributions – for one variable or many variables – it's often too inadequate because the other detailed features of these distributions, like \(\rho\), are not too important and can't be measured too reliably or accurately, anyway. So sometimes (or usually) a simpler distribution that depends on fewer parameters – normal distribution and/or simple assumptions about the correlations or the relative magnitude of various errors – is a better, more pragmatic assumption for an experimenter.

Now, let us assume that an experimenter ends up in the situation with \(w_2\lt 0\) where the dependence of the error margin on \(\rho\) is surprisingly a decreasing function. What should happen if the experimenters end up there?

Well, my answer is that they should realize that there was a mathematical flaw or loophole in their argument or expectation or guess concerning the question "which assumption about \(\rho\) yields the largest, most conservative \(\sigma\)". They should discover this subtlety – or be told about it – and take it into account.
Even if you are an experimenter and you are making a mathematical or logical error, you must fix it!
In several of the comments on his own blog, Dorigo seems to disagree with this simple slogan. He says that it is a sin for an experimenter to modify his methodology "after the fact" because it leads to a "bias".

In some circumstances (that I will discuss momentarily), this warning may be right. But in this case, it is obviously bullšit. Why? Simply because the naive methods to deal with the errors and to expect that they are increasing functions of \(\rho\) is simply mathematically wrong for some values of the parameter \(\rho\). And if anyone is making a mathematical error, he must fix it. In other words:
Mathematics is always valid and to use mathematically sound and error-free steps is a duty for a theorist but it is a duty for a good experimenter, too. The desire to fix something that was seen to be a mathematical flaw isn't any "bias" because the correct theory that agrees with all the experiments – whatever the theory is – cannot violate the laws of mathematics, either.
So I fundamentally disagree with Dorigo's final, third point at the top that experimenters shouldn't update their methodology after they are led to some results that they assumed to be impossible. Mathematical errors and experimentally disproved assumptions – and indeed, the assumption \(\rho\lt \sigma_1 /\sigma_2\) was experimentally disproved in our (possible) thought experiment – simply have to be fixed.

Preserving mathematical errors and wrong assumptions even though it is already known that they are errors or wrong assumptions is wrong, wrong, wrong, and no amount of ideology that experiments should be perfectly "blind" can be used as a legitimate excuse.

At the end, the gap between Dorigo's opinions and mine is that he is a big hater of physical theories in general. More precisely, he believes that he is extracting the truth about Nature from the experiments "directly". But this is a completely wrong way of looking at all of science, especially on the relationship between theory and experiments. Why? Because:
Every experiment, however blind, must always be evaluated and interpreted within some theoretical framework, some assumptions.
There is simply no way to extract the truth about Nature from an experiment that would be completely "independent" of any theoretical picture. One may use a more specific or more predictive theory or a model (e.g. a top-down theory in particle physics), or a phenomenological description of the data that doesn't even force one to learn what a Lagrangian is. But even the latter – even the simplest extrapolations or interpolations we may be making – are always another theoretical framework.

Every experimenter has to make some assumptions when he evaluates the experiments and his assumptions – for example the prior probabilities of various competing propositions – have to be balanced and vaguely compatible with what the theorists are saying as a community. In particular, no experimenter has the right to eliminate theories "a priori". Experimenters may only eliminate theories by falsification. For example, when an experimenter says that he doesn't like SUSY "a priori" and it affects his behavior, he isn't an honest scientist because he is making judgements – and is affected by them – about things he has no clue about. Experimenters shouldn't try to become "alternative theorists" and build on these "alternative theories" because they're not good theorists.

If an experimenter implicitly assumed that the weight \(w_2\gt 0\) and he discovers that \(w_2\lt 0\) is also possible, it's an important discovery that he simply cannot hide, mask, or forget. By having made the invalid assumption that \(w_2\gt 0\) always holds, he simply made a mistake that has to be fixed. He has to fix his experiments so that they acknowledge the possibility \(w_2\gt 0\). And theorists who were assuming that \(w_2\gt 0\) is universally true must also fix their sloppy arguments. In other words:
It is unforgivable if someone keeps on "squeezing" the data into a theoretical framework that is known to be wrong, if he insists on (increasingly twisted and unnatural) interpretations of the measured data that preserve some assumptions (which were really proven to be incorrect).
If a new set of assumptions is found that is more smoothly compatible with the data, it has to be immediately acknowledged as a possibility. All experiments that could have suffered from the mistake have to be corrected, revised, and interpreted again, and if it turns out that the new corrected "framework" is naturally and much more compatible with all the experimental data, this new framework must become the new "standard"! The "older" theories obviously can't have any permanent "monopoly": theories have to be treated fairly and equally regardless of their "age".

In effect, Dorigo is saying that it is always unethical for an experimenter to admit that he or his work has been naive, stupid, and spoiled by incorrect assumptions. It shouldn't be surprising that a man who believes such a thing continues to be naive, stupid, and spoiled by incorrect assumptions when it comes to almost all theoretical questions in particle physics.

He also discusses his views about a slightly different "after the fact" modification of the experimental methodology:
This problem is common to blind searches too, when one, upon opening the box, finds out that the data is smaller than the total background by 2-sigma or so. What to do? If you go back and revise your background estimate, you might be doing a good thing and you might not; doing this systematically will remove 2-sigma fluctuations from the pool of your results. In total, you will be publishing biased limits.
Sorry but even in this case, Dorigo's strict approach is the approach of a dogmatic experimenter who can really never discover anything new and important.

If an experimenter makes a measurement and from the beginning, he assumes that the measured quantity \(N\) will obey \(N\gt B\), i.e. that the measured quantity is higher than some background (whatever is the reason why he assumed it – it may be his own sloppiness, the sloppiness of theorists who encouraged him to do the experiment, or some more or less advanced theories or principles promoted by theorists that seem to imply \(N\gt B\) in general), and if he actually sees any evidence that \(N-B\gt 0\) doesn't hold in general, then it is obviously a duty for the experimenter to report this violation of the assumption.

If the confidence level that \(N-B\lt 0\) is just 2 sigma, it's just an "emerging hint" of some surprising observation. At the 3-sigma level, it starts to be somewhat interesting. Above 5 sigma, he may want to suggest a discovery. But the rules are always the same and:
The experimenter can simply never hide or obfuscate a discrepancy between his experiments and theoretical assumptions about the data. He can never promote his assumptions to a dogma that define the morality because the assumptions may very well be wrong and finding such situations is really one of the most important type of goals of all the experiments!
So if an experimenter measures a 2-sigma and perhaps 3-sigma deficit of some events which is unexpected, it is some "emerging hint" of evidence that either he was doing something incorrectly, or some theories or assumptions are invalid. All these possibilities have to be investigated and if some mistake in the experiment or the theory is found, thanks to this search, and it looks "really likely" that the error is there, it simply has to be acknowledged and fixed. Such a result of the experiment may end up being much more important than the originally stated goal!

Just imagine if Dorigo's dogmatic attitude was applied e.g. to Bell's inequalities. Imagine that theorists wouldn't have discovered quantum mechanics in time and an experimenter would measure something and during the process, he would also see that Bell's inequality is violated. Should he refuse to modify his methodology because it's "unethical" to fix the mistakes "after the fact"? I don't think so. He should notice and publish the results – and they could indeed become the most important experimental results in a century that would soon lead to the theoretical discovery of quantum mechanics, too.

As I have already said, Dorigo's mistake is to view experimenters as some people who always know what they should assume, who promote their prior assumptions to the benchmarks of morality, and who extract the truth about Nature "directly", without any need for theories and theorists. In the real world of legitimate science, experimenters always have to make theoretical assumptions – usually assumptions discovered by theorists or phenomenologists – and they are effectively comparing the likelihood of different competing theoretical descriptions of the experimental situation.

The right guarantee that "prevents the bias" simply means that the experimenters are not allowed to make any steps that would selectively make one theoretical paradigm look less true than another theoretical paradigm without real data that may discriminate between them. But if an experimenter finds something that seems to disagree with the assumptions made by everyone or almost everyone, they are obliged to revise their methodology and their interpretation and take the new possibility into account.

At the beginning, I mentioned the three claims by Dorigo. As you just saw, I almost entirely disagree with Dorigo's suggestion that the experiments shouldn't fix things that turn out to be errors. What about the other two claims? Well, I tend to agree that experimenters should try to estimate their error margins as fairly and accurately as possible – they shouldn't artificially increase them. If one artificially increases error margins, it can make him lead to deny the evidence supporting some theory that already exists, and despite the "opposite sign", that's pretty much "an equally bad sin" as to claim evidence for a theory, evidence that isn't really solid or even semi-solid yet.


  1. Isn't the bias introduced when you fix something after the fact possible to compensate for? I see how the "how many sigma" is affected by changing things that would not have been changed for some of the outcomes. But it should be possible to estimate the error introduced, shouldn't it?

    Agree of course that you shouldn't throw out perfectly valid experimental data because preceding plans on how to interpret the results were flawed.

  2. Exactly, JollyJoker, and yours is the right thing to do it.

    The data always carry some information and may be used to check one hypothesis or another hypothesis, and all of this may be done in a less correct or more correct way.

    The same data may be and should be reused to tell us as much as possible about the most refined questions and most accurate theories for which the data is relevant.

    What's really important to avoid the bias is that the methodology under which the raw data were obtained is not changing as a function of some undefined moods or the experimental results in other experiments. For example, it is wrong to eliminate data points that differ too much from what other experimenters obtained - because others may be wrong.

    But the methodology by which one interprets some raw data always depends on some theoretical assumptions and of course that it's good to make this theoretical framework as correct as possible, and as close to the cutting-edge viable theories that may explain the experiments.

  3. Seriously, I am surprised that this topic ("It is both ethical and right for an experimenter to correct his mistakes") is even something to write about. It seems self-evident: if one makes a mistake, admit it and proceed onward.

  4. Alas, as an experimental particle physicist I should say you are right. It is really a small technical/semantical problem, but you are more right than Tommaso. Nevertheless, you have to take what TD says modulo his passion for polemics. At the end it is really a small thing and I'm surprised you spent such a long post on it..

  5. I agree, it's a triviality, nevertheless, people like Tommaso Dorigo, an experimenter at the LHC, disagree because they worship a principle that the experimenter is not allowed to change his methodology (fix its mistakes), ever, because that would be a modification of the criteria "after the fact".

  6. Thanks, Kiril, thanks for your near-agreement with the substance but I don't think it's a small thing. Indeed, it deserved one of 6,000 blog posts here because it's a large enough thing. ;-)

    I think that this disagreement is one of the main reasons that leads to so many people's hatred against the insights that physics has found in the recent 40 or 60 or perhaps 90 years. They have simply learned to love the principle that *every* change of their theoretical opinion made necessary by the latest insights would be an unethical fudge.

    They have upgraded the principle "never learn anything new and never polish your theoretical opinions" to a principle of morality. And as you probably agree, this principle of morality has some "justifiable core" because fudging may be bad. But they blow it completely out of proportion so that their strong version of the principle would make *any* progress in science impossible.

    This view of theirs is a part of the Popperist religion, the naive idea that the right theories must be immediately guessed in their final form (and then decided Yes/No by straightforward experiments), otherwise something unethical is going on. Of course that it's almost never the case. Theories are almost never written "immediately" in their final form from scratch and exactly these adjustments they abhor are essential in the true evolution that makes most of the scientific progress including the big breakthroughs possible.