Briefly, some news on Friday, August 1st. As I expected (see the text below), Tim Salimans is now ahead of Gábor Melis although his advantage is infinitesimal. (Friday 4 pm update: Melis is at the top again.) With the so far minor help of incredible variables from Christian Veelken of CMS, I (or counting the promised 10% share for CV, we) joined the club of those with the score above 3.8, see the leaderboard. Every contestant who is not a complete loser must feel safely above 3.8, so the score associated with my name is now 3.80007. ;-) I am not selecting that submission for the contest because I don't have all the sources that produced it – it was very complicated.
The text below was originally posted on July 24th.
Gábor Melis' new formidable challenger
Tim Salimans makes the Terminator look like Pokémon
As recently as two hours ago, I thought it was conceivable that I would end up in the top three of the Higgs Kaggle challenge. See the leaderboard.
The top 5 contestants hadn't changed for a week. Gábor Melis was at the top followed by the Marijuana Hybrid guy, by your humble correspondent, and by 1,100+ other participants.
Terminator, Ironman, Batman, and a few Transformers as seen from the optics of a company in Utrecht.
Times are changing. For more than an hour, Tim Salimans of Utrecht, the Netherlands has been the new #2 warrior. His 7th submission with the score 3.81888 catapulted him to that place and made the victory of Gábor Melis uncertain.
Almost all contestants at the top are experienced machine learning software experts – your humble correspondent is a true rural bumpkin in this company (my experience with machine learning and computers is that I managed to jump a few trains on Subway Surfers and solve a hard level at Candy Crush Saga after 500 attempts) – but Tim Salimans makes even most of the urban contestants look like bumpkins.
Just to be sure, his Kaggle profile says that he has won (!) 4 previous Kaggle contests, including one on dark matter data, was the 2nd several times, too, 10 times in top ten (in total), and he has hosted his own Kaggle contest, too. More shockingly, he is
[a f]ounding partner and data scientist at predictive analytics consulting firm Algoritmica, with a PhD in computational Econometrics and a strong academic background in Machine Learning.The company web, algoritmica.nl, explains that
Algoritmica combines machine-learning algorithms with the power of supercomputers to build unparalleled predictive models for marketing, risk, fraud, supply chains, and maintenance. We lead companies around the world from average business processes to a truly data-driven organization. Empowered with predictive models, these companies learn from data to stay ahead of the competition, cut waste, and delight customers.OK, I added the last sentence but it may be true, anyway.
Algoritmica also supervises the NSA and FBI and keeps track of all the data and patterns in the 2 trillion telephone calls and e-mails that they record every month.
Salimans seems to have no specific training in physics but it's clear that he does care what the LHC collision data mean. In a question he had posted to the Kaggle forums, he was asking where he could find the algorithm used by the ATLAS Collaboration to estimate the Higgs mass from the candidate event. This is a rather difficult calculation whose result, the MMC mass, is the first "feature" describing each event and by far the most complicated "derived" quantity calculated from the raw collision data.
I am pretty sure that by today, he has incorporated the improved version of the MMC mass estimator to his supercomputer superprograms. In fact, I find it likely that he has added the CMS' not-so-frequently used alternative to the MMC estimator, the (N)SVFIT algorithm, as well, and the help of the (N)SVFIT feature as an added one may help one to jump above 3.8 even if other things are lousy. I was thinking about adding (N)SVFIT but it's a rather complicated program that I would have to reverse-engineer, rewrite from scratch, and you know, two hours ago, I felt that I would be the only contestant to waste my time in this way.
Whatever Salimans has exactly done, I feel that it's ludicrous to try to compete with such a monster. My mobilization against him is only going to be as symbolic as the Czechoslovak army's mobilization against the Third Reich right after the Munich Betrayal, in September 1938. ;-) My codes and software infrastructure is based on several legs and lots of partial cute ideas. But I don't even have any systematic "quality control", like strictly dividing the training dataset to training and validation. I am sure that he not only does so but does so dynamically, with some meta-machine-learning that adjusts the learning computer to make it learn better than the previous programs, and so on. The possibilities are endless.
Of course, it's great if really powerful guys like this one make their job and switch from econometrics to particle physics at least once. I hope that it will be useful for the LHC research, too. If his (or other commercial professionals') methods are significantly more effective than those at the LHC, I believe that CERN should simply hire them or buy their software etc. to perform similar tasks. If the LHC experimenters are "clearly amateurs" in comparison, they should admit it and CERN should fire some of them and replace them by true professionals.
On the other hand, if his AMS score got stuck at the current level just 0.03 above the score of your humble correspondent who is doing all these things with a $0 software on a $500 laptop and with 0 pre-contest experience with machine learning software, it would be rather stupid to pay millions and millions of dollars to a special Dutch machine learning company designed to conquer the world. ;-)
My respect to the Dutchmen's sophistication is immense and they have my condolences after the downing of the MH17 flight. However, it's probably more natural for me to root for the fellow Austrian-Hungarian Visegrád guy now. Gábor, István, Balázs, Jánosz, go, go, go! ;-)