## Friday, December 23, 2005 ... //

### E=mc2: a test ... interplay between theory and experiment

An experiment that is claimed to be the most accurate test of Einstein's famous identity "E=mc2" has been performed by physicists on the other side from the Central Square - at MIT.

Their accuracy is 55 times better than the accuracy of previous experiments. They measured the change of the mass of nucleus associated with the emission of energy after it absorbs a neutron. I find their promotion of the experiment slightly dishonest:

• "In spite of widespread acceptance of this equation as gospel, we should remember that it is a theory," said David Pritchard, a professor of physics at MIT, who along with the team reported his findings in the Dec. 22 issue of Nature. "It can be trusted only to the extent that it is tested with experiments."

The words "it is [just] a theory" remind me of something. The formula is not just "some" theory. It is an inevitable, robust, and rather trivial consequence of special relativity - a theory that has been tested in hundreds of other ways. Many different experimental constraints are known and many of them are more stringent than those from the current experiment. Naively and dogmatically speaking, the formula can only be trusted to the extent that it is tested with similar experiments.

Realistically speaking, the formula - and many other formulae - can be trusted well beyond these experiments. Everything depends on the amount of reasoning that we are allowed to perform with our brains in between the experiments. It is not true in science that every new experiment is really new. The whole goal of science is that we know the result of a huge class of experiments without actualling performing them. We can make predictions. Very general predictions and less general predictions. And science is able to do such things, indeed. If we are allowed to think a lot, the experiment is not terribly thrilling and its result is known in advance. There is just no way how we could design a theory in which the results will be different that would be simultaneously compatible with the experiments that have already been made.

Also, a theorist would not say that such an experiment is testing "E=mc2"; it is very hard to explain to a particle physicist why one thing they measure is "mass" while the other is something else, namely "energy". They just measure several different forms of the same quantity - one that can be called either mass or energy. Finally, if some discrepancy were found, no sane physicist would interpret it as a violation of this particular formula. It's because we have no candidate framework that would be consistent with basic properties of the Universe but that would violate "E=mc2". For example, Noether's theorem only guarantees the conservation of one quantity associated with the time-translational invariance - it is the total mass/energy. Of course that in the case of an experiment that would disagree with the theory, we would have to look for other, more technical and subtle explanations of such a discrepancy.

But all these comments are completely hypothetical because this is just a very low-energy experiment described by well-known physics and we know that that there won't be any discrepancies.

Isolating data to compare

There is one more topic related to the interactions between theories and experiments. Steve McIntyre is trying to clarify some confusions of Rasmus Benestad here. Benestad writes, among many other bizarre things, the following:

• When ARIMA-type models are calibrated on empirical data to provide a null-distribution which is used to test the same data, then the design of the test is likely to be seriously flawed. To re-iterate, since the question is whether the observed trend is significant or not, we cannot derive a null-distribution using statistical models trained on the same data that contain the trend we want to assess. Hence, the use of GCMs, which both incorporates the physics, as well as not being prone to circular logic is the appropriate choice.

In other words, his proposed strategy is to pick a favorite model of yours - a model that predicts a "signal" - and to work on showing that the observations are consistent with the model. Benestad clearly believes that it is not necessary to try to verify the hypothesis that the observations are a "signal" in the first place. The statement that we observe a "signal" not noise is a dogma for him. No analysis of the "natural background" and its statistical parameters is required; in fact, it is not even allowed, Benestad argues.

I find his reasoning circular, flawed, and downright stupid. This is exactly how crackpots operate: they almost always want to make a big discovery - to find a huge signal - without learning what is actually the "background" above which their hypothetical signal should exist. To be sure: of course that we must first know what to expect without a conjectured new effect if we want to decide whether the new effect exists. And if such expectations are determined experimentally, of course that we are not allowed to include the new effect when we determine the expectations. (This reminds me of the fermionic zero mode debate.)

Let me try to describe the situation in one more way. Rasmus is not right when he says that we cannot derive a null hypothesis from the datasets themselves. This is what we’re doing in many situations in science - every time when an actual nontrivial check of our theories is provided by reality. Let us look at an example. We have thousands of such examples in physics.

CMB and the isolation of theory and experiment

When the cosmic microwave background was discovered, no one had a complete theory. It was determined from the data that the microwave background was thermal and what was its temperature. Namely 2.7 kelvins. Of course one had to know that “being thermal” was a natural answer about the structure of radiation; in fact, the CMB is still the most accurate natural thermal blackbody curve we have seen so far. But you don't need to understand or calculate some situations in general relativity to understand that the observed radiation is approximately thermal!

The fluctuations of the temperature were determined from the data, too. Their dependence on the scale was also found and the spectrum was seen to be approximately scale-invariant. Finally, deviations from the scale invariance are also observed from the data.

The main conclusions - thermal curve; scale-invariant fluctuations; violations of scale invariance in a particular direction; various correlations etc. - are derived directly from the observed data.

Then you independently pick your Big Bang theory and you see that it naturally explains the thermal distribution because everything was in equilibrium 300,000 years after the Big Bang when the radiation was created. Also, inflation that took place a long time before this era - a fraction of second after the Big Bang - explains scale invariance. And some more detailed calculations that depend on the inflationary model also predict some deviations from the scale invariance, and many models may be falsified in this way. In fact, the last observation - the deviations from scale invariance - do not yet have a generally acceptable theoretical description even though people can, of course, fudge their models to get an agreement, much like the climate modellers are doing so.

What I want to say is that there must separately exist conclusions derived from the experiments; and conclusions derived just from the theory. And these two sets of conclusions must be compared. If someone is showing an agreement simultaneously by twisting and cherry-picking the data according to the theory and fudging the theory according to the data, merely to show that there is a roughly consistent picture, then it is no confirmation of “the” theory. In fact, there is no particular theory, just a union of ill-defined emotions whose details can be changed at any time. It’s not science and one cannot expect a "theory" obtained in this way to have any predictive power. This is how the priests in 15th century argued that the real world is consistent with the Bible.

The order of discoveries must be arbitrary

A correct scientific theory must be able to make predictions of some feature(s) of the observed data before the data is observed - this is why it is called a prediction - and the same thing holds vice versa. Nontrivial experimental facts must be determinable and describable without the ultimate theory before this theory is found, otherwise they cannot be used to determine the theory. In other words, it must always be a historical coincidence whether the theory or the experiment was the first group that gave the result.

Of course I am not saying that the actual evolution of science is decoupled to theorists and experimentalists who don’t talk to each other. What I am saying is that they should not be talking to each other - and they should never build their research on their friendship - when they try to determine whether a theory agrees with some particular observations.

In this particular case, whether or not some heating is an example of natural persistence or an effect caused by XY is, of course, an important scientific question. It is much more likely and “default” that it is caused by some long-term persistence because if it were not, there are still very many factors XY that could be really causing it. If we don’t have an observation that would suggest that the persistence does not exist (for example accurate enough observations of the 15th century temperature), we should not assume that it does not exist. Of course that it probably does, and a goal of the scaling papers is to find phenomenological laws that would help to determine the color of the noise - and henceforth also the persistence at various time scales - from the data, regardless of some additional effects caused by anyone else.

The qualitative question whether the persistence exists is quite clear. It does. The noise exists at all scales. The real question is a quantitative one.

Background vs. signal

It is extremely important to know what is the “natural background” if we try to figure out whether there is a new “effect”. Some people like Rasmus Benestad just don’t want to study the natural background at all - they immediately want to get effects (and attention of the press in which they're pretty successful because many journalists are pretty dumb) - which is why I think that they are crackpots. As mentioned previously, one of the defining features of crackpots is that they want to make big discoveries before they learn what is the science describing the “simpler” phenomena before their discovery.

Let me say why their research is defective in one more way.

Whenever we try to design scientific theories that describe something, we must know which quantities in reality will be described by our theories and we must be able to isolate them.

By isolating them, I mean both theoretical as well as experimental isolation. In theories we must know - or at least feel - that the effects we have neglected do not change our predictions too much. In experiments we must know - or at least have rational reasons to believe - that the effects we observe are not caused by something else, something “more ordinary”. When we try to observe telepathy, for example, we must know that the people are not communicating by some more "natural" methods.

The climate modellers almost never try to follow these lines. They have a completely vague, sleeky set of ideas that predict anything and everything - warming, cooling, bigger variations, smaller variations, more hurricanes, less winds, increased circulation, diminished circulation, more ice in Antarctica, less ice in Antarctica, and so forth - and then they’re arguing that the data agrees with these predictions. Of course they emphasize the points whenever they agree and de-emphasize them whenever they disagree. This is not science.

Of course there is no direct way how one can ever construct a scientific framework out of this mess. To do science, one must focus on a limited class of questions that are sufficiently well-defined and that have a chance to be “cracked” by a theory. I am sure that there are many nice laws about the climate that we don't know yet, and I am equally sure that the work of most of the "mainstream" climate scientists today is not helpful in revealing these laws.

When we try to argue that the humans are suddenly dictating the climate trends - after 5 billion years when they were dictated by other, more natural things - it is a rather extraordinary conjecture that deserves extraordinary evidence. For getting any evidence, it is absolutely necessary to understand how the climate was behaving for 5 billion years before the hypothetical “revolution” occured around 1917. We must know what were the fluctuations and how they depended on the time scale. We can only learn such things reliably by observing the real world. Only once we know the background, we can study the additional effects.

Studying additional trends above a background that we don’t need to understand is equivalent to the Biblical literalism.

Summary

Some readers may feel that the two parts of this text contradict each other because I defend theory in the first part and the observations in the second part. However, I am convinced that every sane scientist (and informed layman) knows that both theory as well as experiments are important. My goal was certainly different from changing the balance to one side. My goal was to emphasize that science should be looking for robust conclusions and theories and it should be attempting to find the situations in which the phenomena exhibit themselves in the sharpest possible way. And a necessary principle to achieve this goal is to try to follow these principles:

• try to isolate the "signal" that you are interested in as well as you can
• when your signal exists above a certain "background", you must definitely try to understand the background first
• if you can find an idealized situation in which one signal is isolated from some other effects that you're not interested in, study this situation
• if you cannot find an idealized situation and if everything looks like quantitatively undescribable chaos to you that you want to match by a computer-generated chaos, then it means that you still misunderstand what's going on; avoid the quagmire and return to the point #1
• if you have a theory, be sure to deduce and decide what kind of quantities the theory should be able to predict
• if your theory only agrees with some observations, never fool yourself and never try to de-emphasize the observations you know to disagree with your theory
• if your theory or model only agrees "roughly" with 24 features of the data but there are 25 parameters or assumptions that led to your model, be sure that you can't claim that you established your model or its assumptions
• isolate the assumptions of your theories (and open questions) from each other and try to test them separately whenever you can
• if you try to explain experimental data, always ask whether there exists a simpler and more natural theory than yours that would be able to do approximately the same job
• if there is a more natural theory with less parameters, go for it
• never believe that your theory is superior just because it is using the buzzwords - or approximate concepts and laws - that are more frequent in physics; this is not how the better theories are identified

#### snail feedback (1) :

You said:

What I want to say is that there must exist separately conclusions derived from the experiments; and conclusions derived just from the theory. And these two sets of conclusions must be compared. If someone is showing an agreement simultaneously by twisting and cherry-picking the data according to the theory and fudging the theory according to the data, merely to show that there is a roughly consistent picture, then it is no confirmation of “the” theory. In fact, there is no particular theory, just a union of ill-defined emotions whose details can be changed at any time. It’s not science and one cannot expect a "theory" obtained in this way to have any predictive power. This is how the priests in 15th century argued that the real world is consistent with the Bible.

I agree with this comment insofar as one should not try in an ad hoc way to fudge theory and to twist experiment so that they agree.

However, the Bayesian approach is a rigorous and consistent way of matching up theory and data. When you look at what Bayes gets up to when you relate a number of alternative models (or a continuuum of alternatives) to the data (assumed noisy and incomplete, as usual), then in effect Bayes is doing all of this twisting and fudging that you don't like.

Bayes considers all of the alternative models that you have told it about, and it tells you how to compute a probability for each of these alternatives. The structure of the expression for this probability is where you find the "twisting and fudging" going on. For instance, with a continuum of models that differ only in the value of a fundamental constant, the Bayesian approach tells you how to compute a "posterior" probability over the values of the constant, and the structure of this expression tells you how much you have to "twist and fudge" to make theory and data fit each other.

As long as you "twist and fudge" rigorously and consistently (i.e. use Bayes) then it is a scientifically respectable activity!