### Higgs: Does science require to hide data?

Collider-related, Wednesday: Lyn Evans was chosen the director of the CLIC+ILC linear collider unification governance body. I think it's a very good choice.
Unofficial reports say that the Higgs signals after more than 5/fb of the 8 TeV 2012 data boast the same same strength that you would expect based on the 2011 5/fb 7 TeV data.

Up to this point, almost 12.5/fb of data were delivered to each detector in the 2010-2012 runs and about 92%, or 11.5/fb, have been recorded.

In particular, the diphoton ($\gamma\gamma$) channel shows the same strength as it did in 2011, reaching 4 sigma per detector just from the 2012 data. This fact makes it more likely that at least one of the major detectors – ATLAS or CMS – will be able to announce a 5-sigma discovery at the ICHEP 2012 conference in Melbourne that starts on the Independence Day and lasts for a week.

The probability of an "individual detector's Summer 2012 discovery" has increased substantially (discovery using one detector, a combination of channel – or one detector, the diphoton channel only). Although we don't know too many details, I would say that even the suspicion that the diphoton channel is stronger than expected from the Standard Model will increase, too.

Of course, the Higgs discovery – more precisely, a discovery of something that surely differs from a Higgsless part of the Standard Model and that is close to the Standard Model Higgs but may be something else (and may gradually acquire properties that are significantly and demonstrably different) – is just a formality. For people who follow the published data and sensible evaluate them, the existence of a 124-126 GeV Higgs boson has been a sure thing at least since December 14th, 2011.

JollyJoker has pointed out the following article on these developments by Prof Dr Matt Strassler CSc:
New Higgs Rumors Have Arrived
JollyJoker has reacted to some of Matt's words, especially those about withholding scientific information and asymmetric ways to use the data. I still think that Matt is a highly technically potent scientist – but make no doubts about it, JollyJoker is right.

Various independent physicists and science fans – especially climate realists – know very well that I am not one of the most passionate advocates of the "publish all scientific data for everyone" movement although I usually agree with it – and this Higgs saga is no exception.
In fact, I think it would be great if the fresh data from all the LHC collisions could be made completely public and perhaps even user-friendly. We could easily find that there are many people outside the LHC physics teams who are able to write much cleverer, faster, and more accurate algorithms to deal with the information than the hired experimental physicists. And that would be a good thing – for science.

To write something balanced, I am ready to acknowledge the following justifications to withhold data in some situations:
• practicality – raw data may often be complicated and unreadable enough so that only an inner circle of scientists around some research understand them; it may be too much work (and expensive procedure) to make it readable for everyone and scientists may naturally be unwilling to publish mess
• credit – individual scientists or scientific teams often take credit for their discoveries that have some value; for obvious reasons, they don't want to publish data (and they have some moral right not to publish the data) from their work in progress that would help others to scoop them
• protection of scientists against laymen's attacks – scientists require some degree of peace and calm atmosphere to do their research impartially; data from work in progress may be enough for others to scream but may be still incomplete and the screaming may make the completion of the work harder; I could also write a separate point about the protection against invalid interpretations – it's still better if the data are mostly interpreted by experts because they have (or they should have) a higher chance not to draw wrong interpretations – but I will include this observation into this point, too.
All other arguments support the idea that freely accessible data and information are good for science.

The reasons are obvious. Science depends on reproducibility and verification. The more accessible the data are (incidentally, I prefer to insist that the data are a Latin noun in plural, while the singular is a datum), the more people are able to reproduce the research, verify it, and perhaps go beyond it. Freely available data are also good to make biased research less likely. If a group of scientists has a bias or an agenda, others may discover and correct this fact assuming that they have access to the data.

Prof Matt Strassler also wrote:
This is especially true since we learned last year that some well-known non-particle-physicist bloggers have information pipelines directly into the experiments.  It is perhaps inevitable that there are scientists who see it in their best interest to subvert the scientific process.
I only have direct onymous pipelines to theory and phenomenology – and occasionally, anonymous pipelines to experiments. My contacts in the ATLAS and CMS teams remain 100% silent when it comes to the communication about the LHC findings (people like Dorigo may need an extra discussion). The adjectives such as "non-particle-physicist" above suggest that the writer primarily talks about Not Even Wrong even though it's likely that the experimenters revealing the data to that website are anonymous, too (again, except for some CMS bloggers who probably reveal the information onymously yet privately).

It surely sounds crazy for me to defend Peter Woit but much like JollyJoker, I simply don't believe that the publication of information "subverts the scientific process" as long as we are talking about the "scientific process" that is highly compatible with good scientific manners. Quite on the contrary, science depends on knowledge – as much knowledge as we can get – so not having enough data always undermines one's ability to participate in the scientific process. The inaccessibility of information cools down science as much as liquid nitrogen.

Another point in which I unambiguously agree with JollyJoker is that one must use the data symmetrically when it comes to the refutation or confirmation of theories – and one should use all the relevant data. Cutting a subset of the data away from the picture means to deny some evidence – and this denial may be used to make the "Yes" answer or the "No" answer more convincing. It's wrong in both cases.

One may also mention that it's not too natural to use a subset of the data to take care of some possible problems – e.g. use the 2011 data to eliminate the look-elsewhere effect – but it may be done and similar strategies are common. Still, this attitude may heavily underestimate the strength of the signal because the 2011 signal near 125 GeV is much stronger than what is needed "just to erase" the look-elsewhere ambiguity.

There's one more point I agree with JollyJoker: Matt hugely exaggerates how difficult it is to combine various datasets to get an accurate enough idea about the strength of a signal. He even talks about the difference between 7 GeV and 8 GeV to be hugely complicated etc. But the difference is small even if you neglected it and interpreted all the collisions as 7.5 TeV collisions. Moreover, we know the (near) power laws by which various cross sections depend on the energy in the models we care about so we may easily be more accurate. Also, Phil Gibbs has shown that one may get visually indistinguishable combinations by the most straightforward formulae you may think of. They only start to break down when the confidence level is really small but in that case, there's not much to talk about, anyway. Matt is simply creating mysterious dragons where there are almost none.

There are many things to discuss here but JollyJoker is right in his major points. It's not good for science if a selected ad hoc group of scientists is given enough room to hide some data (e.g. new data) and publish other data and if it is given a monopoly to interpret them (new data as well as old data) in their preferred way. This is not how good science should look like because such a secrecy and a monopoly reduces the efficiency and balance in the evaluation of the data and in the determination of their consequences. And if this secretive "scientific process" may be undermined, it's a good thing to undermine it, indeed.

Remotely related: This touching story about a Second World War widow searching for her husband (6-minute video) – which culminates in a bombshell – may also teach us a lesson about the results of withholding the information and some people's "monopoly" (or "self-believed monopoly") over this information. Thanks to Gene for the link.

One of the reasons of keeping the data until formal publication that you have not listed is a sociological one that has very much to do with the HEP experimental community.

As you know, organizing good physicists is like herding cats, it is not different if they are experimentalists studying the data or trying to get the last bits of accuracy out of their solid state detector baby, they have this "inventor urge", "what new can we find here urge" otherwise they would not have gone into physics.

If you have noticed, the ATLAS and CMS experiments have about 3000 physicists pulling the cart each. These physicists are not there for light support and decoration. Your demeaning term " hired" that lowers them to technician level is unacceptable. There would be no data if each, or at least most, of them were not dedicated with a monastic one track concentration to the job at hand. Good data need good physicists.

Completely open data, from the accelerator to the web page available to the theorists, needs a complete reorganization of the model on which HEP experimental physics has been running up to now: with students working for their doctora thesis, i.e. with an original piece of research for it, in exchange for endless hours of shifts and hard work babysiting and jollying detectors.

It happened with the bubble chambers. The engineers took over and we just had to take pretty pictures home to do our physics analysis. Both CMS and ATLAS could be run by engineers, which of course would increase the price tag of the data since their salary scale is much higher than a graduate student's. It could be done, but it needs a change of paradigm in the academia.

As it is the experiments are committed to those graduate students, in exchange for their labor to provide them with the opportunity of original research for their thesis. Opening the data right now to everybody would make thousands of thesis coming from LHC experiments redundant, and unacceptable, since somebody on the net could have published the results from the same data in the blink of an eye.

Dear Anna, apologies, I probably didn't understand this "additional" justification of the secrecy.

The word "hired" was used to convey my opinion that the experimental physicists are not the "owners" and therefore the "ultimate determinants" of the LHC project. If the word sounds as I am saying that they're "just employees", then this is exactly what I wanted to say.

Concerning the last paragraph, if someone on the net could do the job more efficiently than a team whose hard work is done by some grad students, then again apologies, but this means that the graduate students wouldn't really be at the top as experts.

Can I summarize your comment that you are trying to promote protectionism and suppression of competition in science? Because that's the only way how I am able to interpret your argument in favor of secrecy. Needless to say, I am totally against such policies.

One could perhaps tolerate that someone gets a PhD for something that someone else, a random person on the Internet and a non-PhD, could do more efficiently – although this picture of PhDs already looks problematic. But what I am not ready to tolerate is to slow down science just in order for some people who don't really deserve a degree to get a degree.

If you think that the primary things are the degrees, so give it to the people even though someone else can do the same thing without a degree or independently. But please don't suppress the accessibility of the scientific data as a side effect – this accessibility is surely much more important than a PhD degree for a not maximally competent person.

Dear Lumo,
that would be very nice if I could have the smileys with the Diskus comments to ... :-)))
Concerning the replay structure, I would appreciate that too, if you could do it ;-). But it is your decision, maybe you can just work on and try it if you feel like trying it in the course of time until October 1th. Maybe some wise people will have goog sugestions how it can more efficiently be done too ... If it works: very nice :-D; if it cant be done: would be a pitty but not a cosmologicl catastrophe ;-)

All I am saying Lubos is that in order to open the data base for the existing experiments a complete reorganization of how experimental high energy physics can be done is needed, which is not feasible: If the present student experts leave, changing fields for their thesis, there will be no data because the experiments depend on them.

You could plan for the next experiment, ILC, to organize it from the beginning on the lines of "open data to all" organizing maybe a free market: "best brains get first analysis" according to some measure. The present experiments are with the old paradigm and commitments.

Maybe in 200 years a physicist could sit at his/her terminal and plan experiments that robots will be carrying out. That would be ideal.

This story made me realize this. Imagine you make an experiment having a new signal with 3 sigma. Not enough for discovery. Than I do the same thing, again with 3 sigma. And then 20 other folks do the same thing with 3 sigma. But because all experiments were made different people, you don't to combine them, so the signal remains without enough significance, while combination would give 10 sigma. It is very stubborn, isn't it? Is CERN trying to make this story longer just in case they don't find anything else besides Highs? We live in the XXI century, information does not travel like in 1981! When will this people realize this?

Sorry, Anna, I just don't follow. The point here was just to make the data available. You are turning it into destruction of high-energy physics. WTF?

People at the LHC are paid – or given degrees – for various analyses of the data and other insights. None of these things depends on the data's being secret. Also, I don't see any glimpse of evidence that the students would have to leave if the data became open. They're there for their positive attitude to physics, because of their desire to work on it, and they progress in their careers etc. by doing it well enough, relatively to whatever is the relevant competition. Yes. They may also have competition – sometimes unpaid competition – but how would it lead a grad student to leave? Maybe uncompetitive student but that may be what you meant?

It's a good thing if the research is done by the most appropriate folks and it's a bad thing if some people play a game that they're the best folks to do something in the world even though they're not. This is, once again, a typical situation in the climate science but it's wrong if it happens anywhere, in any field.

What you write just makes no sense to me. It may be due to some completely different culture background of the two of us. Your comment is of the type "protectionism and other left-wing distortions of the society are so paramount that everyone would die without them". I don't believe a letter of this junk, sorry.

Also, this discussion has nothing to do with robots and I don't think that it's "ideal" if robot scientists will replace humans as researchers because humans would lose the feeling of personally touching the face of God. I am surprised by this comment of yours as well. It surely sounds as though you must dislike physics if you find it "ideal" to leave it to robots, doesn't it?

By the way I find the new Higgs post of Prof. Strassler indeed a bit too cautious and politically correct.

Dear Mikael, the abuse report is a bug of DISQUS that should be fixed by them soon. That's a message for everyone who sees it - the right message is just "your comment awaits moderation".

At any rate, you won't see it because I jus whitelisted you, too.

Right, it's overcautious and PC. PC may sound nice but it hurts many people, just different people, for example Phil Gibbs in this case who has invested the highest amount of expertise-loaded time into the combinations etc. And his work is just being dismissed without a trustworthy justification.

Very true. In your case, the estimated confidence would be sqrt(21)*3 sigma = 13.74 sigma. And yes, it's plausible that some of the delays may be deliberate. However, it's also plausible that the discovery will simply be announced in early July.

It may happen that the LHC will find nothing else; however, it's far from guaranteed. Every time you triple your overall number of collisions (integrated luminosity), things that were invisible because below 2 sigma may become 3 sigma - signals out of "nothing" we saw before - and 3 sigma may become 5 sigma. That just how it works. So as long as you approximate the luminosity growth by an exponential of time, the rate/chance of new discoveries per unit time is pretty much constant.

Yes, we do seem to have different data bases when we discuss. For example when I am talking of robots I am not saying they would do the physics ! The physicist would be sitting behind a terminal designing the experiment and guiding the robots!

In Greece for a thesis to be awarded there has to be an "original investigation" in there, something nobody else has published. We had great trouble convincing old professors that there could be original ideas in thesis from experiments with 300 participants.

I do not think that this originality requirement is a greek invention. It is a contract: students provide cheap scientific labor in order to get their hands on the data and a good thesis.

If a student finds that his/her thesis will not be approved because there exists already a publication from the same data on the same subject, what would be his/her choices other than getting out from HEP as soon as possible and get to a field that offers the opportunity of original research?

Anyway, let us agree to disagree, though I am not against a future planning on the lines of completely open data, if possible.

If I felt that Physicists would withhold information my interest would drift away from cool physics...

Dear Anna, aren't robots already doing almost all the mechanical work for us - in experiments and elsewhere?

In Greece, research for a PhD must be original. That's great. This rule has a very good reason, you know. It's a part of the standards that only give a degree that only some people have to people who can do something that not everyone can do.

But if it the research is original only because other people in the world are preventing from doing the research, then you're cheating the rule. If it's so, you may very well change the Greek law and distribute PhDs for non-original research, too. It would still be better for science and the society.

Again, I think you have offered no evidence at all that the openness of the data changes anything about "contracts" that grad students are participating in or their salaries.

"We had great trouble convincing old professors that there could be original ideas in thesis from experiments with 300 participants." - Well, the old professors surely had a point, at least in many cases, didn't they? But it's probably easier in Greece and elsewhere to change what people consider reality, right?

Concerning the latest point, I am also surprised by the sentence:

"..out from HEP as soon as possible and get to a field that offers the opportunity of original research?"

Do they want research that is truly original or research that is fabricated by various bans and suppression of information so that it may be called original? Which one do the students want? Let me give you an answer; the good students want genuinely original research, the students who suck want research that just pretends to be original.

Moreover, my general comments are not specific to HEP, not at all. Some fields impose some standards that are reflected in the reality - they really educate students who have special skills and knowledge - and others don't. The only way to interpret your sentence about the exodus from HEP is that you want students to go to fields that don't have any genuine standards, whose expertise doesn't really make one more skillful or smarter than others.

Well, such students may indeed go to easier fields where it's guaranteed by artificial criteria that research is original, special, and appreciated even if it is not. But such students who end in climate scientists shouldn't be surprised when others notice that they are just pseudointellectual pretentious worthless trash.

