Tuesday, April 04, 2006

Bruce Knuteson: automatic interpretation of data

Bruce Knuteson from MIT works for the CDF detector group at the Tevatron. He normally builds the detectors much of his time. Nevertheless, he gave a very intriguing talk that, in my opinion, deserves to be discussed.

First, he showed an optimistic picture of a two-year-old baby (himself), explaining that this was when the last significant discovery in experimental particle physics occured. Then we were voting what was the most likely type of new physics that we would discover first. There were roughly 10 groups to choose from. In the U.S., 34 percent of his audience vote for supersymmetry - SUSY is the clear frontrunner. At Harvard, most of us voted for "something else" - including me, although SUSY would clearly be my second choice and under certain circumstances, it would be my first choice.

Bruce also emphasized that most of the data from the Tevatron or the LHC is - or will be - unavailable to the phenomenologists. Nima argued that if one becomes an associated member of the collaboration, the content of the previous sentence may be circumvented.

Automatic interpretation of the LHC data

But back to the main topic of his talk. Bruce mentioned many interesting "philosophical" points about the way how physics is actually done. He argued that physics is never done in isolated stages - so that you would first measure the cold data, then you would try to extract the anomaly, and finally you would try to find an interpretation and a theory. Instead, all of these things occur simultaneously.

When an experimenter has some data, she implicitly judges them according to four criteria:
• reliability of detectors - how certain you are that the apparatus worked properly; the experimenters' stupidity is negatively included in this rating
• statistical significance - how many standard deviations of certainty your experiment gives you to know that it is not a statistical fluke
• quality of predictions - how well you know that the predictions of the standard theory and the new theory have been correctly and accurately calculated
• quality of interpretation - how natural, motivated, and robust your theory explaining the new data is
With your subset of the collider data, you are effectively rated in all these categories. If you get poor grades in some of them, you should better compensate them by good grades from other categories. Otherwise there is no good reason to promote and publish your data.

I am sure that Bruce Knuteson's picture correctly describes the procedures how discoveries are made, at least in epochs when virtually no discoveries are made. ;-) Of course, during the scientific revolutions, it is very normal that the breakthroughs only get huge rating in some of these categories. Well-established and highly motivated theories can work with very small numbers of events. On the other hand, some experiments can be so reliable that you find them interesting and solid before any interpretation emerges.

Bruce described an automatic bottom-up system to interpret the experimental collider data. Its goal is rather ambitious because it is a kind of machinery designed to replace all phenomenologists by computers. The system is based on three pieces of algorithms and software:
The strategy is to take all possible discrepancies from the Standard Model and divide them into boxes. These boxes are parameterized by discrete choices; all events in a box contains events with the same kind of particles in the final state. Bard automatically constructs all conceivable tree level diagrams (diagrams from an effective field theory: the vertices can, in principle, be non-renormalizable) that can be deduced from a particular Lagrangian that can potentially explain certain events.

There was some consensus in the rest of the room that this particular procedure is rather straightforward even if you do it without the computers - and that Bruce's counting how many kiloseconds and kilostudents you need to cover the parameter space was exaggerated.

Quaero is another level of the software - one that actually communicates with the collider data. It is a framework that assumes that high transverse momentum is where you see new physics. In each box of the events, it automatically chooses the subset of observables (functions of momenta) from a more general set. This subset is composed of the quantities that are most useful in determining the new physics. Bruce believes that this system can eliminate the need for dozens of papers whose main goal is to show which parameters are most important for describing regions inside the multi-dimensional parameter space - and which subregions of the simplified parameter spaces have been ruled out.

Bruce has shown plots of the two-dimensional subspace of the parameter space in MSSM. Even if you don't like the string theory landscape, be sure that the landscape of effective field theory is much more messy and chaotic: this is why it's not really called a landscape but rather a swampland. ;-)

There has been an extensive debate between Bruce and Nima's phenomenology team whether the software is actually useful to simplify some of the calculations. If I summarize and vulgarize the content of the discussion, Nima believed that you need to have a good intuition to see which bumps in the data should be taken seriously and which bumps should not.

It is an art, not just statistics: sometimes you show your good experience and intelligence if you ignore much bigger anomalies than those that you take seriously. You are only good in this art if you have some experience with the way how the data work in different situations - and understanding of the "top-down" physics is very helpful.

Bruce's attitude was the opposite one: these processes can essentially be automatized. He's using these approaches to analyze the CDF data, most of which are secret. It's been proposed by many of us that his programs should be tested at the LHC olympics to see whether this strategy is really useful in practice. Nima's attitude, as sketched above, is that when we deduce physics, we always use some combination of the bottom-up and the top-down approach. Bruce focused, as reasonably expected from an experimentalist, on the bottom-up approach.

Sleuth is a general strategy to look for new physics associated with a large transverse momentum. But because I could confuse the exact roles of these different algorithms and programs for the whole software, I choose to redirect you to the resources behind the links.

I feel that more or less everyone would agree that the things that can be automatized should be automatized. Sometimes, as Bruce showed, such an automatic procedure can lead to a higher confidence level of your results. Just like the CDF's and D0's five-sigma discoveries of the top quark could have been combined to a seven-sigma discovery of the whole Fermilab, you can integrate the hep-ph archive from 2000 to 2006 to get more accurate and reliable results.

Nevertheless, it is likely that there will always be room for human creativity and intuition, even in experimental physics. When we have the correct pre-conceptions about the physics at very high energies, we will always be more likely to look at the "right" observables.

Bruce also offered more general and ambitious plans to generalize his evaluation algorithms to all of science. I guess that sciences that study arbitrary physical systems whose basic rules are fixed could be automatized in this fashion. However, when the laws are changing and you don't know what is the space of theories (analogous to space of effective field theories) where you should look at, things become more subtle.

Related discussions

Bruce said that the CDF is gonna publish the B-meson results in a week or so. He also endorsed the Fermilab's open attitude concerning the minor tritium leak. More importantly, there are 80 anomalous events known at the Fermilab but I can't tell you the details because I don't know them. ;-) Finally, his biggest nightmare is that the LHC will find something that they should have discovered by the Tevatron. He believes that this will convince every individual congressman to stop any funding of the U.S. particle physics.

I tend to disagree.

I, for one, don't believe that the U.S. Congress would ever pass a resolution whose main idea is that the Americans can't compete with the rest of the world. Surely, if Tevatron is ever shown to have missed something, it is because they have not received a sufficient amount of money. The possibility that the Europeans could be better in doing something will strengthen, not weaken, the drive of the U.S. politicians to support their scientists. Competition was what was driving the developments of advanced weapons in the U.S.; the cold war was among the main reasons why Reagan proposed the SSC, too. If Europe becomes a serious competitor, it will provoke America to do better.

But I may be wrong.