Wednesday, June 20, 2012

Higgs: Does science require to hide data?

Collider-related, Wednesday: Lyn Evans was chosen the director of the CLIC+ILC linear collider unification governance body. I think it's a very good choice.
Unofficial reports say that the Higgs signals after more than 5/fb of the 8 TeV 2012 data boast the same same strength that you would expect based on the 2011 5/fb 7 TeV data.

Up to this point, almost 12.5/fb of data were delivered to each detector in the 2010-2012 runs and about 92%, or 11.5/fb, have been recorded.

In particular, the diphoton (\(\gamma\gamma\)) channel shows the same strength as it did in 2011, reaching 4 sigma per detector just from the 2012 data. This fact makes it more likely that at least one of the major detectors – ATLAS or CMS – will be able to announce a 5-sigma discovery at the ICHEP 2012 conference in Melbourne that starts on the Independence Day and lasts for a week.

Yes, over 50% of TRF readers are Americans.

The probability of an "individual detector's Summer 2012 discovery" has increased substantially (discovery using one detector, a combination of channel – or one detector, the diphoton channel only). Although we don't know too many details, I would say that even the suspicion that the diphoton channel is stronger than expected from the Standard Model will increase, too.

Of course, the Higgs discovery – more precisely, a discovery of something that surely differs from a Higgsless part of the Standard Model and that is close to the Standard Model Higgs but may be something else (and may gradually acquire properties that are significantly and demonstrably different) – is just a formality. For people who follow the published data and sensible evaluate them, the existence of a 124-126 GeV Higgs boson has been a sure thing at least since December 14th, 2011.

If you wanted to say that your humble correspondent seems to be always right and always ahead of others, then you're very correct, too! ;-)

JollyJoker has pointed out the following article on these developments by Prof Dr Matt Strassler CSc:
New Higgs Rumors Have Arrived
JollyJoker has reacted to some of Matt's words, especially those about withholding scientific information and asymmetric ways to use the data. I still think that Matt is a highly technically potent scientist – but make no doubts about it, JollyJoker is right.

Various independent physicists and science fans – especially climate realists – know very well that I am not one of the most passionate advocates of the "publish all scientific data for everyone" movement although I usually agree with it – and this Higgs saga is no exception.
Off-topic, related to HEP physics: Media are full of the BaBar measured deviation from the Standard Model today. I wrote about it already a month ago...
In fact, I think it would be great if the fresh data from all the LHC collisions could be made completely public and perhaps even user-friendly. We could easily find that there are many people outside the LHC physics teams who are able to write much cleverer, faster, and more accurate algorithms to deal with the information than the hired experimental physicists. And that would be a good thing – for science.

To write something balanced, I am ready to acknowledge the following justifications to withhold data in some situations:
  • practicality – raw data may often be complicated and unreadable enough so that only an inner circle of scientists around some research understand them; it may be too much work (and expensive procedure) to make it readable for everyone and scientists may naturally be unwilling to publish mess
  • credit – individual scientists or scientific teams often take credit for their discoveries that have some value; for obvious reasons, they don't want to publish data (and they have some moral right not to publish the data) from their work in progress that would help others to scoop them
  • protection of scientists against laymen's attacks – scientists require some degree of peace and calm atmosphere to do their research impartially; data from work in progress may be enough for others to scream but may be still incomplete and the screaming may make the completion of the work harder; I could also write a separate point about the protection against invalid interpretations – it's still better if the data are mostly interpreted by experts because they have (or they should have) a higher chance not to draw wrong interpretations – but I will include this observation into this point, too.
All other arguments support the idea that freely accessible data and information are good for science.

The reasons are obvious. Science depends on reproducibility and verification. The more accessible the data are (incidentally, I prefer to insist that the data are a Latin noun in plural, while the singular is a datum), the more people are able to reproduce the research, verify it, and perhaps go beyond it. Freely available data are also good to make biased research less likely. If a group of scientists has a bias or an agenda, others may discover and correct this fact assuming that they have access to the data.

Prof Matt Strassler also wrote:
This is especially true since we learned last year that some well-known non-particle-physicist bloggers have information pipelines directly into the experiments.  It is perhaps inevitable that there are scientists who see it in their best interest to subvert the scientific process.
I only have direct onymous pipelines to theory and phenomenology – and occasionally, anonymous pipelines to experiments. My contacts in the ATLAS and CMS teams remain 100% silent when it comes to the communication about the LHC findings (people like Dorigo may need an extra discussion). The adjectives such as "non-particle-physicist" above suggest that the writer primarily talks about Not Even Wrong even though it's likely that the experimenters revealing the data to that website are anonymous, too (again, except for some CMS bloggers who probably reveal the information onymously yet privately).

It surely sounds crazy for me to defend Peter Woit but much like JollyJoker, I simply don't believe that the publication of information "subverts the scientific process" as long as we are talking about the "scientific process" that is highly compatible with good scientific manners. Quite on the contrary, science depends on knowledge – as much knowledge as we can get – so not having enough data always undermines one's ability to participate in the scientific process. The inaccessibility of information cools down science as much as liquid nitrogen.

I just wanted to embed this old LHC video so how should I have introduced it?

There have been several debates on whether or not the ATLAS and CMS physicists "possess" the data from their detectors in the copyright sense. I find such a suggestion incredible. The LHC has cost something like $10 billion and the data from the detector are the only "products" from the collider. So it makes sense to conjecture that the total value of the data over the career of the LHC has the value of $10 billion or so, too. Even if you divide it among the 6,000 ATLAS and CMS physicists, you get $1.5 million per person. Was each of the LHC physicists given a gift that is this expensive? I don't believe so. They are just serving the public – and especially the scientific public – and were hired to manipulate with the data that belong to the taxpayers, the main sponsors of the LHC.

It's true that they got the right to hide their data but the justification can't be that they possess the data in the copyright sense. The arguments favoring secrecy must belong to the list of three "secrecy may be good" arguments. To claim that secrecy is essential for the scientific process is preposterous. A scientific collaboration may have some secretive internal rules and the members may cherish them – it's OK as long as the rules are legal – but it's pretentious and dishonest to sell these secretive rules as "principles of science" which they're surely not.

Asymmetric usage of rumors, data

Also, Matt Strassler is trying to find a way to escape from his previous, long-held positions that the Higgs boson was remaining uncertain even after December 2011 and so on – because he actually realizes that the newest data probably make the "No Higgs near 125 GeV" attitude truly indefensible. It seems clear to me that the rumors are actually being used in his text, perhaps including the numbers on the statistical significance. Still, he tries to hide those from others and he doesn't even tell his readers what the statistical significance seems to be. I just don't think it's fair.

Agent Higgs is a cool $1 roadblock puzzle-style game for your iPhone or iDevice. Buy via iTunes. I've actually bought it, nice – and I am at level 10 now. More words here. Meanwhile, acknowledging the formidable European competition, the Fermilab has changed its primary specialization and welcomed five new bison calves. Via Katie Yurkewicz (Twitter).

Another point in which I unambiguously agree with JollyJoker is that one must use the data symmetrically when it comes to the refutation or confirmation of theories – and one should use all the relevant data. Cutting a subset of the data away from the picture means to deny some evidence – and this denial may be used to make the "Yes" answer or the "No" answer more convincing. It's wrong in both cases.

One may also mention that it's not too natural to use a subset of the data to take care of some possible problems – e.g. use the 2011 data to eliminate the look-elsewhere effect – but it may be done and similar strategies are common. Still, this attitude may heavily underestimate the strength of the signal because the 2011 signal near 125 GeV is much stronger than what is needed "just to erase" the look-elsewhere ambiguity.

There's one more point I agree with JollyJoker: Matt hugely exaggerates how difficult it is to combine various datasets to get an accurate enough idea about the strength of a signal. He even talks about the difference between 7 GeV and 8 GeV to be hugely complicated etc. But the difference is small even if you neglected it and interpreted all the collisions as 7.5 TeV collisions. Moreover, we know the (near) power laws by which various cross sections depend on the energy in the models we care about so we may easily be more accurate. Also, Phil Gibbs has shown that one may get visually indistinguishable combinations by the most straightforward formulae you may think of. They only start to break down when the confidence level is really small but in that case, there's not much to talk about, anyway. Matt is simply creating mysterious dragons where there are almost none.

There are many things to discuss here but JollyJoker is right in his major points. It's not good for science if a selected ad hoc group of scientists is given enough room to hide some data (e.g. new data) and publish other data and if it is given a monopoly to interpret them (new data as well as old data) in their preferred way. This is not how good science should look like because such a secrecy and a monopoly reduces the efficiency and balance in the evaluation of the data and in the determination of their consequences. And if this secretive "scientific process" may be undermined, it's a good thing to undermine it, indeed.

And that's the memo.
Remotely related: This touching story about a Second World War widow searching for her husband (6-minute video) – which culminates in a bombshell – may also teach us a lesson about the results of withholding the information and some people's "monopoly" (or "self-believed monopoly") over this information. Thanks to Gene for the link.

Well, if you haven't noticed, Echo comments may still be displayed but no new Echo comments are being accepted. Use DISQUS only.

I needed to stop the changes to the Echo comments files because they add lots of mutations and random errors to the migration process. Lots of hours of my CPU time – plus lots of my manhours – are needed to fill the missing URLs to the 73,842 Echo comments – the most important non-automatic part of the migration process.

You may pray that it will essentially work tonight. If that is the case, I will improve the XML file to import the Echo comments to DISQUS just a little bit, and that will probably be the last import of Echo data to DISQUS. So the only question I leave to a vote is whether I should re-enable new Echo comments after the import – with the condition that all the new Echo comments would be lost without a trace on October 1st. My vote is No.

At this moment, it seems that with a few exceptions, the Echo comments should appear under the right blog entries as DISQUS comments, with the correct author names (including Guests whose author names had to be inserted manually into 2,000 comments where any name was absent), but they will probably not have the right avatars and they will not be "possessed" by the corresponding DISQUS user.

The pictures attached to Echo comments should be stored, too – with a link to a Dropbox copy of these 171 files added to the relevant Echo-turned-DISQUS comments. Let's see whether it works.


I've completed a test import into a different account. Some random threads - less than 10% or so e.g. Celebrating Grassmann Numbers - seem not to show up. I don't know why. Otherwise everything is OK except that the reply structure (nesting) isn't preserved. This method of import doesn't have the potential to do so. Also, all the Echo comments will show up as comments from named but "anonymous" users with the universal default avatar. I am giving some time to figure out what to do with these imperfections but at some moment, I may give up and import the XML with these problems.


  1. I`ve shortly read through Prof. Strassler's new post yesterday already ... I interpreted his comment about the pipeline exactly in the same way and it turned my stomach ... :-(.
    Even though I agree with Lumo that data and results should not be hidden, I think that PW and TD do what they do rather in order to get more attention and publicity (and of course they have an agenda...) than for science ... :-/

    Now I'm curious about what Jolly Joker said at Prof. Strassler's site ... :-)

  2. Dear Dilaton, sorry for having disabled new Echo comments and your yellow smileys - missing so much. ;-) If you want to re-enable, I may, but I won't do a selective import again so all new Echo comments posted after now would be lost on Oct 1st...

    Right, rumors are spread by publicists – and maybe even by the good guys – mostly to get more attention and to acquire the label of the guy or the babe who knows what's shaking...

  3. Dear Lubos,
    I don't find the hiding of LHC information too bad in this case because it is only temporarily until the official announcement. A scientist may always chose to publish some result immediately or wait until it is more complete.

    I am against opening the echo comments again although I also miss Dilatons smilies.

    I don't get what are the up and down arrows you can click on below each post. Is it some vote system?

  4. Dear Mikael, 0 uparrow downarrow below each post is a voting system, indeed. There are some other cool gadgets around. You should also register with DISQUS, it's fun.

    Your point on the "truth will come out in this case" is understood. However, sometimes the timing and speed may matter, too.

    I just tested the import on a testing DISQUS account. It works well, including the attached images. However, all Echo comments will show up as avatar-free anonymous DISQUS comments.

    Moreover, the reply structure of the Echo comments will be lost. The DISQUS' importing program ignores the Echo parent-guid structure although the latter is very transparent.

    Should I try to struggle to restore the reply structure? Or should I just go and import the Echo comments without avatars and without the reply structure?

  5. Ok, let me check the DISQUS registration later today.

    Well, since you ask me, I'd say there is no time pressure right now to finish the comments migration to DISQUS until Echo is switched off so I'd say you could try at some time for a more perfect result. But I understand you want to complete this boring task as soon as possible. So I am very happy that the content of the comments is not lost which is the most important thing.

  6. Strassler doesn't seem to have deleted my comment, although it's a bit of a rant.

  7. Strange, in random browsing, I found out that some postings such as

    don't show any comments after import. I hope that the error isn't too widespread... I looked at 10 pages or so after the import was fully completed (in the testing account) and only this one had missing comments.

  8. One of the reasons of keeping the data until formal publication that you have not listed is a sociological one that has very much to do with the HEP experimental community.

    As you know, organizing good physicists is like herding cats, it is not different if they are experimentalists studying the data or trying to get the last bits of accuracy out of their solid state detector baby, they have this "inventor urge", "what new can we find here urge" otherwise they would not have gone into physics.

    If you have noticed, the ATLAS and CMS experiments have about 3000 physicists pulling the cart each. These physicists are not there for light support and decoration. Your demeaning term " hired" that lowers them to technician level is unacceptable. There would be no data if each, or at least most, of them were not dedicated with a monastic one track concentration to the job at hand. Good data need good physicists.

    Completely open data, from the accelerator to the web page available to the theorists, needs a complete reorganization of the model on which HEP experimental physics has been running up to now: with students working for their doctora thesis, i.e. with an original piece of research for it, in exchange for endless hours of shifts and hard work babysiting and jollying detectors.

    It happened with the bubble chambers. The engineers took over and we just had to take pretty pictures home to do our physics analysis. Both CMS and ATLAS could be run by engineers, which of course would increase the price tag of the data since their salary scale is much higher than a graduate student's. It could be done, but it needs a change of paradigm in the academia.

    As it is the experiments are committed to those graduate students, in exchange for their labor to provide them with the opportunity of original research for their thesis. Opening the data right now to everybody would make thousands of thesis coming from LHC experiments redundant, and unacceptable, since somebody on the net could have published the results from the same data in the blink of an eye.

  9. Dear Anna, apologies, I probably didn't understand this "additional" justification of the secrecy.

    The word "hired" was used to convey my opinion that the experimental physicists are not the "owners" and therefore the "ultimate determinants" of the LHC project. If the word sounds as I am saying that they're "just employees", then this is exactly what I wanted to say.

    Concerning the last paragraph, if someone on the net could do the job more efficiently than a team whose hard work is done by some grad students, then again apologies, but this means that the graduate students wouldn't really be at the top as experts.

    Can I summarize your comment that you are trying to promote protectionism and suppression of competition in science? Because that's the only way how I am able to interpret your argument in favor of secrecy. Needless to say, I am totally against such policies.

    One could perhaps tolerate that someone gets a PhD for something that someone else, a random person on the Internet and a non-PhD, could do more efficiently – although this picture of PhDs already looks problematic. But what I am not ready to tolerate is to slow down science just in order for some people who don't really deserve a degree to get a degree.

    If you think that the primary things are the degrees, so give it to the people even though someone else can do the same thing without a degree or independently. But please don't suppress the accessibility of the scientific data as a side effect – this accessibility is surely much more important than a PhD degree for a not maximally competent person.

  10. Dear Lumo,
    that would be very nice if I could have the smileys with the Diskus comments to ... :-)))
    Concerning the replay structure, I would appreciate that too, if you could do it ;-). But it is your decision, maybe you can just work on and try it if you feel like trying it in the course of time until October 1th. Maybe some wise people will have goog sugestions how it can more efficiently be done too ... If it works: very nice :-D; if it cant be done: would be a pitty but not a cosmologicl catastrophe ;-)

  11. All I am saying Lubos is that in order to open the data base for the existing experiments a complete reorganization of how experimental high energy physics can be done is needed, which is not feasible: If the present student experts leave, changing fields for their thesis, there will be no data because the experiments depend on them.

    You could plan for the next experiment, ILC, to organize it from the beginning on the lines of "open data to all" organizing maybe a free market: "best brains get first analysis" according to some measure. The present experiments are with the old paradigm and commitments.

    Maybe in 200 years a physicist could sit at his/her terminal and plan experiments that robots will be carrying out. That would be ideal.

  12. This story made me realize this. Imagine you make an experiment having a new signal with 3 sigma. Not enough for discovery. Than I do the same thing, again with 3 sigma. And then 20 other folks do the same thing with 3 sigma. But because all experiments were made different people, you don't to combine them, so the signal remains without enough significance, while combination would give 10 sigma. It is very stubborn, isn't it? Is CERN trying to make this story longer just in case they don't find anything else besides Highs? We live in the XXI century, information does not travel like in 1981! When will this people realize this?

  13. Sorry, Anna, I just don't follow. The point here was just to make the data available. You are turning it into destruction of high-energy physics. WTF?

    People at the LHC are paid – or given degrees – for various analyses of the data and other insights. None of these things depends on the data's being secret. Also, I don't see any glimpse of evidence that the students would have to leave if the data became open. They're there for their positive attitude to physics, because of their desire to work on it, and they progress in their careers etc. by doing it well enough, relatively to whatever is the relevant competition. Yes. They may also have competition – sometimes unpaid competition – but how would it lead a grad student to leave? Maybe uncompetitive student but that may be what you meant?

    It's a good thing if the research is done by the most appropriate folks and it's a bad thing if some people play a game that they're the best folks to do something in the world even though they're not. This is, once again, a typical situation in the climate science but it's wrong if it happens anywhere, in any field.

    What you write just makes no sense to me. It may be due to some completely different culture background of the two of us. Your comment is of the type "protectionism and other left-wing distortions of the society are so paramount that everyone would die without them". I don't believe a letter of this junk, sorry.

    Also, this discussion has nothing to do with robots and I don't think that it's "ideal" if robot scientists will replace humans as researchers because humans would lose the feeling of personally touching the face of God. I am surprised by this comment of yours as well. It surely sounds as though you must dislike physics if you find it "ideal" to leave it to robots, doesn't it?

  14. By the way I find the new Higgs post of Prof. Strassler indeed a bit too cautious and politically correct.

  15. Dear Mikael, the abuse report is a bug of DISQUS that should be fixed by them soon. That's a message for everyone who sees it - the right message is just "your comment awaits moderation".

    At any rate, you won't see it because I jus whitelisted you, too.

    Right, it's overcautious and PC. PC may sound nice but it hurts many people, just different people, for example Phil Gibbs in this case who has invested the highest amount of expertise-loaded time into the combinations etc. And his work is just being dismissed without a trustworthy justification.

  16. Very true. In your case, the estimated confidence would be sqrt(21)*3 sigma = 13.74 sigma. And yes, it's plausible that some of the delays may be deliberate. However, it's also plausible that the discovery will simply be announced in early July.

    It may happen that the LHC will find nothing else; however, it's far from guaranteed. Every time you triple your overall number of collisions (integrated luminosity), things that were invisible because below 2 sigma may become 3 sigma - signals out of "nothing" we saw before - and 3 sigma may become 5 sigma. That just how it works. So as long as you approximate the luminosity growth by an exponential of time, the rate/chance of new discoveries per unit time is pretty much constant.

  17. Yes, we do seem to have different data bases when we discuss. For example when I am talking of robots I am not saying they would do the physics ! The physicist would be sitting behind a terminal designing the experiment and guiding the robots!

    In Greece for a thesis to be awarded there has to be an "original investigation" in there, something nobody else has published. We had great trouble convincing old professors that there could be original ideas in thesis from experiments with 300 participants.

    I do not think that this originality requirement is a greek invention. It is a contract: students provide cheap scientific labor in order to get their hands on the data and a good thesis.

    If a student finds that his/her thesis will not be approved because there exists already a publication from the same data on the same subject, what would be his/her choices other than getting out from HEP as soon as possible and get to a field that offers the opportunity of original research?

    Anyway, let us agree to disagree, though I am not against a future planning on the lines of completely open data, if possible.

  18. I am to submit a report on this niche your post has been very very helpfull auto glass

  19. If I felt that Physicists would withhold information my interest would drift away from cool physics...

  20. Dear Anna, aren't robots already doing almost all the mechanical work for us - in experiments and elsewhere?

    In Greece, research for a PhD must be original. That's great. This rule has a very good reason, you know. It's a part of the standards that only give a degree that only some people have to people who can do something that not everyone can do.

    But if it the research is original only because other people in the world are preventing from doing the research, then you're cheating the rule. If it's so, you may very well change the Greek law and distribute PhDs for non-original research, too. It would still be better for science and the society.

    Again, I think you have offered no evidence at all that the openness of the data changes anything about "contracts" that grad students are participating in or their salaries.

    "We had great trouble convincing old professors that there could be original ideas in thesis from experiments with 300 participants." - Well, the old professors surely had a point, at least in many cases, didn't they? But it's probably easier in Greece and elsewhere to change what people consider reality, right?

    Concerning the latest point, I am also surprised by the sentence:

    "..out from HEP as soon as possible and get to a field that offers the opportunity of original research?"

    Do they want research that is truly original or research that is fabricated by various bans and suppression of information so that it may be called original? Which one do the students want? Let me give you an answer; the good students want genuinely original research, the students who suck want research that just pretends to be original.

    Moreover, my general comments are not specific to HEP, not at all. Some fields impose some standards that are reflected in the reality - they really educate students who have special skills and knowledge - and others don't. The only way to interpret your sentence about the exodus from HEP is that you want students to go to fields that don't have any genuine standards, whose expertise doesn't really make one more skillful or smarter than others.

    Well, such students may indeed go to easier fields where it's guaranteed by artificial criteria that research is original, special, and appreciated even if it is not. But such students who end in climate scientists shouldn't be surprised when others notice that they are just pseudointellectual pretentious worthless trash.

  21. Well done Lubos, this comment section is working much better now! Keep the good job =)

  22. Thanks! Just to be sure, the "abuse report" message should have read "your comment is awaiting moderation". The wrong text is a bug that should be fixed soon.

  23. No need to feel that way. It is the rate at which information is released that is under discussion, and whether immediate access to raw data by anybody would improve the rate and the quality of information.

  24. I hope the world remembers the name of Billy D. Harris, a very ordinary Texan who gave his life to help end the Nazi scourge. We owe more to these young men than can ever be repaid.

  25. in Greece you might spend 8 years and not get a PhD and it is normal to spend 8-10 years for an undergraduate degree. the good thing if data became more public is that there would be more competition to produce better results from those data than a few people having them and not facing competition about the speed and quality of work with it.

    i have to submit a masters in 4-5 days from a PhD i got kicked out of 4 years ago and i was working with catalogues that anyone can download from the internet because it is the only thing i have. they tell me that the research i have done for the masters is not original. if i could have more data, maps etc. i would be able to work on other things too but i can't since i cannot have access to more data.

    when Nokia was the only company that had access to certain hardware it was the best mobile phone company. after other companies got access to the hardware, they became better than Nokia that did not develop as good software that made the difference to the consumer. the same protectivism happens in academia. if all those guys paid with their money to get the results ok...but they did not use their money for it.