If you follow the preliminary leaderboard of the Higgs ATLAS Kaggle contest where 1,288 teams from various places of planet Earth are competing, you may have noticed that I have invited Christian Veelken of CERN to join my team. He kindly agreed. I believe he is one of the best programmers who reconstructs the Higgs properties from the tau-tau decays in the CMS team, the other big collaboration at CERN aside from ATLAS whose folks organize the competition.
The current decision is that so far the viable scores were obtained predominantly by me so I own 90% of the team which is enough not to ask the minority shareholders whether they like the name of the team. ;-) Of course that it may change in the future. My belief is that the relative importance of members of such a team has to be based on the preliminary scores and their contributions to the high ones. It's not a perfect way to rate things but it's better than all others, for reasons I could explain. This question is analogous to the question whether managers' incomes in companies should depend on the profits, revenues, and the stock price. Even though there are risks and things can go wrong, I would answer Yes because this arrangement rooted in imperfect yet measurable data at least guarantees some correlation between the salary and the future of the company and some motivation for the manager to fundamentally improve things.
For the first time in the human history, Christian has applied the CMS' methods to evaluate these tau-tau decays (SVFIT) to the ATLAS data, the data of his intra-CERN competitors. It works. So far, it doesn't produce detectable improvements in the AMS score by itself (or in combination with the ATLAS methods): SVFIT, although more sophisticated, behaves almost identically to ATLAS' MMC. Christian has some really professional ideas what to do and I also believe that if they fail to produce high scores, he will help me to professionalize the codes that I used to get where I/we seem to be because, as you can imagine, the codes have become messy.
Meanwhile, however, I kept on improving the score. Our best one currently stands at 3.83674, just 0.014 below the current leader Gábos Melis. That's exactly equal to my last improvement and I got two of them in the last 24 hours so feel free to estimate how much time it should take to take over. ;-)
There have been moments when my mood was one of resignation. It seemed impossible to reach the heights of the leaders and the progress was so slow (my jumping up by 1 place ahead of the marijuana guy is the only change in the top 16 during the last week). One simply couldn't have thought about beating Melis, Salimans, or even the marijuana guy – bright kids and men with years of experience in manipulating similar data and doing machine learning.
Without much kidding, my life's only experience with manipulating "big data" was the conversion of 80,000 Echo comments on this blog to the DISQUS platform when Echo came out of business three years ago or so.
But the mood is very different now. It seems that I can add 0.01 to the score more easily than to prepare coffee. It's almost as easy as writing +1-1 at the end of a command y=f(x). ;-) Well, not quite but it is almost mechanically straightforward and it has repeatedly (but not quite always) worked.
One of the proprietary ideas that I've been fond of from the beginning and that I turned into a more viable one by having refined certain functions became even more effective when I realized what are probably the other conditions of the evaluation that are needed for the proprietary idea to become truly efficient, to show its muscles.
Because this explanation seems to be justified by some abstract theoretical thinking as well as the real-world empirical data, I will probably automatize the system and try to prepare a submission without self-evident fine-tuning that could produce a very high score immediately.
Now it even seems plausible to me that even the final scores – which will be computed from 2 submissions per team compared against 450k test events not included in the 100k test events that are the basis of the preliminary leaderboard – could exceed 3.8 so that I will lose a $100 bet. But it's too early to tell. The bet is as open as it can get. Note that the "best score per team" is almost certainly an overestimate of the final score because the preliminary AMS scores contain some noise with the standard deviation of 0.08. So with 300+ submissions, like mine, the best preliminary score could actually be up to a 3-sigma i.e. 0.24 overestimate of the genuine score. There are some reasons to think that the overestimates aren't this brutal but I don't want to go into technicalities that are partly speculative, anyway.