Thursday, September 04, 2014

Kaggle Higgs: view from Mt Everest

Update Sep 16: ninth place, people couldn't compete against the machine learning gurus who knew what they were doing from the beginning. I am / we are ninth at the end. Also, the winner has 3.805 (although everyone else is below 3.8) so I apparently lose a "below 3.8" $100 bet. Heikki is very lucky, isn't he? ;-)

A minor update Sep 15: I just wanted to experience the fleeting feeling of our team's look from the top of the preliminary leaderboard where we (shortly?) stand on the shoulders of 1,791 giants.

You see the safe gap of 0.00001 between us and the Hungarian competition. ;-)

Today, the "public" dataset of 100,000 events will be replaced by a completely disjoint (but statistically equivalent) dataset of 450,000 "private" ATLAS collisions and our team may – but is far from guaranteed – to drop like a stone. And even if it doesn't drop like a stone, there will be huge hassle to get convinced that the code has all the characteristics it should have. I am actually not 100% sure whether I want to remain in the top 3 because I dislike paperwork and lots of "small rules".

Text below was originally posted on September 4th as "Kaggle Higgs: back to K2"

The ATLAS Kaggle Higgs contest ends in less then two weeks, on September 15th or so, and I wanted to regain at least the second place among the 1,600 contestants seen in the leaderboard – because I still believe that it is unlikely for me to win a prize.

After many and many clever ideas and hundreds of attempts, my team returned to the second place where I have already been for one hour in June.

Gábor Melis is ahead of my team by 0.005. I am learning Hungarian in order to revert this gap.

I've tried about a hundred of amusing ideas to improve the recognition of the Higgs signal events. Maybe, after the contest ends, I will post about 10 blog posts with some examples of these ideas and codes.

Today, to regain the preliminary silver place, I was considering the 38,000 testing.csv events that have undefined values of the ATLAS MMC (missing mass calculator) Higgs mass,\[


\] The undefined value of the Higgs mass is a big hint that the collision hasn't really produced a Higgs boson. It doesn't look like a Higgs boson so it probably doesn't quack like one, either. The percentage of the "signal" events among the "undefined MMC events" is small.

So last night, on a trip, I was thinking that maybe the xgboost software does a poor job and classifies too many "undefined MMC collisions" as signal. The number of true signals over there is so tiny that you won't lose much if you just remove all these events from your "list of signal candidates" i.e. if you reclassify all of them as background. And indeed, you do not lose too much score if you do so, just about \[

\Delta{\rm AMS} = -0.005

\] However, if the xgboost had been working poorly and most of the "undefined MMC" events would be false positives, removing them could have improved the score by as much as \(0.2\). It would be cool. Unfortunately, xgboost does a good job in filtering these events.

My improvement of the score that came 20 minutes later is closely related to the considerations above but I can't tell you how the improvement was exactly done. ;-)


  1. A quote from Prof. Cox a couple of years ago:
    Every electron around every atom in the universe must be shifted as I heat the diamond up to make sure that none of them end up in the same energy level. When I heat this diamond up all the electrons across the universe instantly but imperceptibly change their energy levels.

    I recently gave a lecture, screened on the BBC, about quantum theory, in which I pointed out that “everything is connected to everything else”. This is literally true if quantum theory as currently understood is not augmented by new physics.

    Now, many (including Lubos) have pointed out flaws in Brian's understanding of QM. However that's not what is I think is relevant here.

    What I find amazing about this is the bit I emboldened. Brian apparently thinks that something is "literally true" if some current theory says that it is true, but is prepared to change this truthness if the theory changes.

    With this world view, it was "literally true" that the Sun orbited the Earth prior to it being shown to be not true. And CAGW
    is also "literally true" - and hence his view that "you can’t know
    better" - until you do, that is.

  2. You are nothing short of amazing!!! Go Lumo go! %-D

    BTW, I and Jonnie (wife) visited a cafe (in Djurgården, Stockholm) that displayed Swedish flags and a tempting sign promising traditional waffles with strawberry jam and cream; when we walked in the whole place was an advertisement for Czech beer and the place was apparently owned and staffed by your countrymen (2) - who hardly spoke an understandable word of Swedish! ;-}}

  3. You have many more entries than others, so you might be overfitting. The analysis with full data will tell...

  4. Dear Roby, I know it's often being said and I sometimes fear a similar thing.

    But rationally speaking, I think that your comment is a misconception. The preliminary (as well as the final) score is evaluated from a collection of events that is completely independent from the training.csv dataset. So if you need to overfit to get a high preliminary leaderboard score, you really have to do it an event after an event.

    In particular, if you take a submission and change some (say 100) events in a group randomly, you almost always worsen the score because it's more likely for the original ordering of these events - that was computed at least by an approximately useful algorithm - to be more accurate than the randomly reshuffled one.

    So even if you post 10,000 submissions which are obtained by truly random modifications of your initial submission, you have pretty much no chance to improve your preliminary score.

    What one is really modifying are various parameters in the algorithm. But if one gets a higher preliminary leaderboard score by such an adjustment, it's more likely than not that one increases the final score, too. There's really no way why such an adjustment of parameters should adapt your submission to special patterns in the 100k preliminary test.csv events - it's more sensible to think that one is adjusting actual parameters that should be adjusted.

    So the final score may differ from the preliminary one but at the end, I think that this difference should be almost exactly independent from the number of other submissions because everyone is effectively doing the same thing, regardless of the number of submissions.

    To argue in the same way a bit differently, one submission really tells me just O(1) bits of information about the assignment of "s" and "b" labels in test.csv. So by making these submissions, I really get just O(500) bits of information about the 100k events used to calculate the preliminary scores. And that isn't really enough to explain a 0.1 gap above others.

    You know, if you could use the submissions to learn the labels of 500 particular events from the 100k events, and what I am doing isn't really enough for that, you could remove just 50 or so false negatives, raising your score by something like 0.02. That's irrelevant.

    Instead, by making many submissions, one is really learning some parameters describing the right statistical distribution, and that *is* shared by the events that will be used to calculate the final score.

  5. How about just splitting into defined / undefined MMC and optimizing the parameters separately? Presumably the undefined ones need tighter filtering to be flagged as Higgses.

    Also, would the real positives among the 38k be very few (say one or four) ? Then a completely separate optimization for them could be a bad idea.

  6. Dear JollyJoker, I was thinking about the question whether it's good to train the discrete subsets separately (this is true both for undefined MMC and the number of jets which is 0,1,2,3) many times but never tried so because I tend to think that it couldn't help.

    You know, the events with an undefined MMC are otherwise very similar to those with a well-defined MMC, and also have a similar distribution of the signal over the space of the remaining parameters. So you worsen your statistics to accurately determine the dependence of the probability on these remaining parameters if you train the undefined MMC separately.

    Moreover, you would have to calculate the absolute probability in some way because at the end, you would have to combine the discrete subsets correctly. You know, relatively rank in the subgroup isn't a good estimate for the absolute rank - the undefined MMC events are almost universally lousy. So what should their final score in your 1 submission be etc.?

    In the case of the number of jets, it's similar. Some dependence of the probability is universal. For very high-energy extra jets, the probability of signal goes up intensely. But I sort of think that the boosted trees catch these things well enough. If the probability distributions are very different for different numbers of jets, the algorithm will divide the trees in this way, anyway. If the probability depends on the other parameters similarly independently of the number of jets, it's better to have greater statistics.

  7. I think Cox is just using "literally" to distinguish his statement from the commonplace statement that everything is connected with everything else, e.g., that you are connected with distant relatives, that something manufactured in Mississippi may be used in Iceland, etc. Similarly, if someone prior to Galileo were to call it "literally true" that the sun orbits the earth, he would mean that he was not merely speaking of appearances or some poetry which might say that the sun orbits the earth. So I think your concern with "literally" is misplaced.

  8. ok, I did not knew that the score was computed on a different sample. Indeed this should mostly avoid over-fittings

  9. "You know, the events with an undefined MMC are otherwise very
    similar to those with a well-defined MMC, and also have a similar
    distribution of the signal over the space of the remaining parameters."

    Hmm. I would have guessed this wouldn't be the case. Then just (somehow) taking into account that they're less likely to be signal is probably better than treating them separately. Given your (very clear) water/landscape analogy, treating the undefined MMCs as if they were otherwise in the same spot on the same landscape but a little higher. And the defined MMCs as if they were a tiny bit lower? Of course, you have the same problem as

    "Moreover, you would have to calculate the absolute probability in some way because at the end, you would have to combine the discrete subsets correctly."

    since nothing a priori tells you how much less likely an undefined MMC event is to be signal.

    I think I understand the optimizing of false negatives vs false positives well enough.

    Anyway, hope you still get some new ideas for the win! :)

  10. Lubos, you may wanna glance this as an anger remedy:

  11. That was 996 at the end there, not 9. SORRY

  12. Wow, 95% confidence - that dude must be an idiot.

    As you imply, Lubos, it would be very hard to find a highly skilled physicist who thinks GW is anything but a very minor concern for mankind’s future, with the caveat of course that she/he/it not be receiving GW funding. So the question to me is why is this shit ubiquitous? Seems clear that the GW harping is a window into Power’s nature. That politicians love it is obvious as their power becomes almost limitless if voters will only swallow the cool aid, but how many in the media truly believe the “narrative”? Given how they love their “truth to power” and “objectivity” bromides, it seems the lack of any hard-hitting articles detailing the many fallacies in the GW mythology indicates that the media is overwhelming composed of scientific imbeciles wandering about in a very dense fog. It really is becoming pointless to pay any attention to the MSM.

  13. Thanks, Tom. Imagine, 95% "certainty" that the LHC doesn't destroy the Earth. Given the fact that we've seen 20 comparable experiments, it's a small miracle that we're still there. We must try additional 20. ;-)

    It seems very clear to me that competent uncorrupt physicists, if they back the panic at all, do so because they are happy that some people politically and by their job "close enough to them" get very influential. David Gross was once very specific about it when he said: Isn't it great that science is (he meant scientists are) powerful that when they say something, hundreds of billions of dollars of funding are redirected?

  14. Professor Cox may be an airhead but he certainly looks good. If I were that way inclined I'd definitely be interested in a physical relationship.

  15. Hi Anders. That's a rather long yet interesting post by someone who appears to be a very experienced and competent astrophysicist. I like that he invites readers to comment and/or point out errors in his analysis at the end. When I have more time I intend to go through it all in more detail. In fact, his post appears to be largely based on a paper he published in the 'Journal of Cosmology.' (Not a normal journal for CC/GW-type articles but seems appropriate enough for an astrophysics paper such as this [see article 6]:

    My dilemma however is how one could hope to reconcile this analysis with another recent paper that appears to be based on an equally in-depth analysis by a few chaps from CSIRO:

    Could you or Lubos perhaps explain which of these papers makes erroneous conclusions of how these they might be reconciled?

  16. Mark, indeed. Same paper I linked to in my response to Anders below.

  17. He iust just the girl with big tits giving the weather report on TV. The only thing the girl knows about weather is that she has big tits.

    Cox is very similar but he can talk sciency while just delivering someone else's words. Soon we are to have a one legged, lesbian black African to be the UN spokesman on climate change and we already have a bunch of cartoon movies showing the weather 40 years hence.

    It's all just PR Lubos.

  18. Nice anger blog

  19. "I am learning Hungarian in order to revert this gap."
    Hmmm, that may prove harder than winning the contest :)
    Gabor is likely channeling Neumann János Lajos

  20. Did you see the improvement also on the training data?

  21. That’s about it, the lust for power will corrupt just about everyone - the set of those it won’t is nowhere dense.

  22. 'Literally' is one of those meaningless moronic filler words beloved of semiliterates — literally, yeah! :)

    To claim something is "literally true" tells one something about the speaker. It strongly suggests that he is distinguishing that particular claim of his from his regular output which clearly consists solely of truths which are not "literally true" — in other words they're outright lies, or to be more charitable about it, they're figurative truths at best, so one can confidently dismiss them as so much verbal diarrhoea.

    In Cox's case then, speaking figuratively I'd say he was shit factory. But speaking literally I say he's a right c###.

  23. Dear Luboš,

    "So please, the editors in The Guardian, kindly notice that Brian Cox is full of šit. Thank you very much."

    While I fully agree with the sentiment you express there, I am more than surprised you would make such a request of the Guardian, and so politely too. Unbelievable!

    Have you gone nuts? Are you sure you didn't bang your head when that dog knocked you off your bike?

    Listen, if I were you I'd get myself off to the local hospital pronto for a thorough check-up. :)

    If Timothy McVeigh were still around, the Guardian offices would be one of the places I'd strongly suggest he put on his 'to-do' list. It would be a charitable act.

  24. Repeating a lie often enough deserves a good
    beating. If GWA claims were not pure BS they would be citing references in scientific research supporting their argument instead of trying to vilify and discredit the scientists and scientific debate being posited totally discrediting their view of global warming as settled science.

    Kudo’s to scientists around the world who are taking
    these charlatans on.

  25. Dear Nim, nope, I am not even measuring the local score this accurately and I don't think it would make sense.

    This small change by 0.002 is due to a very small number of events, perhaps 1 among the 100k, and of course that it may occur by chance, and I spent many days doing these adjustments for mostly cosmetical reasons - getting the 2nd place in the preliminary leaderboard for a day or 10 days is fun.

    But I am not cherry-picking events individually by their IDs - it's always adjustments to statistical distributions.

    And even if it is just ad hoc adjustments that improve one's score, it's important that one isn't just "picking special features" of the 100k events. More democratically, one should say that one is (either objectively improving the algorithm or magic formula, well for any training.csv and test.csv file or) correcting for the differences between the 250k training events and those 100k preliminary test events. Note that I said "difference" because this has two sides. By doing these adjustments, one may equally well remove the biases arising from statistical flukes in the training.csv 250k dataset. And I think that this part of the interpretation is actually more likely (for greater improvements).

    Do you know whether the typical change of the scores will really be 0.08 (or Pythagorean hypotenuse of 0.08 and 0.04)? I am still uncertain but in recent weeks, I was increasingly leaning to the opinion that the jump (or drop) in the scores will be much more uniform among the competitors.

  26. Haha, because of von Neumann or many others, I should also learn Hebrew. Or maybe Werbeh - I don't know in which order the letters should be spelled yet. ;-)

  27. I literally agree with you here. ;-)

  28. The Oatmeal explains this in an hilarious cartoon:

  29. U better aim for K1 as the failure rate of K2 is 20℅.
    U beat K3 which has a death rate of 22% so don't stop now go for 29000 feet.

  30. LOL, I should. Now and last night, I made two small improvements, but the gap between the leader Melis and me/us is now 0.00047:

    I admit that I was inclined to think that 3.85 wouldn't have been surpassed even on the preliminary board two months ago, and now I did it myself. ;-)

  31. Hah, the minimal visible edge in score on the last day :)

    The Cake thing on the forums seems like it could be a threat. I assume you've used up your submissions for the last day?


    A humble Luboš Motl is the Enola Gay cruising over Hiroshima. Gotta wear shades!

  33. Hi JollyJoker, I have one submission for free but I won't even download the cakes because 589 submissions of experience tells me that it is bullshit hype.

    Whether one gets an increase or decrease while adding new smooth functions of features as new features is pure chance. You may see users reporting improvements by 0.02, and some report worsening by 0.02. There is no physics in it.

    I've tried about 80 nonlinear features that were more optimized to divide "s" and "b" more quickly and clearly but I learned that this is bullshit effort - this is like manually doing something that the computer does automatically and much more cleverly.

    If they could solve the full task of analytically calculating the probability that it is "b" or "s" which uses *all* the features and which takes into account all the 3 backgrounds, not just the Z-boson, that would be something. But what they did here is a cool QFT exercise with no implication whatever for the task that the contestants are supposed to solve.

  34. Thanks, Dilaton! Unfortunately, the dropping won't be affected by me anymore but by the universal laws of gravity (and, let's hope, other forces as well LOL).

  35. :)

    Stick it to 'em, Luboš!

    The world needs decent heroes.

  36. Is this a "contest" or are they using a bunch of super bright people to do something for them that will pay off far more for them than the contest winners?

  37. A bit of both I suspect, but there is nothing wrong with that.

  38. "...our team's look from the top of the preliminary leaderboard where we (shortly?) stand on the shoulders of 1,791 giants."

    This reminds me of a reported quotation from Sidney Coleman--: "If I have seen farther than others, it is because I have stood between the shoulders of dwarfs." :)

  39. Dare I note, didn't you write, only a couple of days ago, about insignificance of seemingly significant numbers, dates inclusive? Why choose the Independence Day over any other date? Boy are you contradictory and "so full of šit", as an unoriginal blogger writes in his declining (s)crap(e)book. But hey $500 will get you that 3rd wheel on the Moskvitch, so you can take your boyfriend out to dinner. It's been years; he complained thus in a blog and it wasN't Even Wrong.

  40. Congrats. Not only did you win but u did it with the slimmest possible margin on the last day which is a feat in itself.

    Better buy some oxygen and hot tea for the decent. The most dangerous part of the climb is the decent. Focus, a warm base camp awaits.

    May God bless you in your next conquest.

  41. LOL, I didn't win. My team is 9th according to the official evaluation using the private dataset. But I wanted to win at least the preliminary round. The real one was just not accessible to ML-non-professionals, and if I didn't fight to win the preliminary one, it wouldn't have helped the real one, anyway.

  42. If we know (as far as we can tell) that the vacuum is not the lowest energy vacuum possible, does that tell us anything about the initial big bang? That is, how could the universe get into this state in the first place? And if the vacuum were to flip to a lower energy, what would the 'excess' energy appear as?