Tuesday, June 17, 2014 ... Français/Deutsch/Español/Česky/Japanese/Related posts from blogosphere

ATLAS race: conquering K2 for the first time

If someone happens to occasionally follow the ATLAS Higgs Contest Leaderboard, she could have noticed that among the 677 athletes or teams, your humble correspondent jumped to the 2nd place early in the morning. (Too bad, the T.A.G. team jumped above me an hour later, by a score higher by 0.00064 than mine, so I am third again.)

This is how I imagine the formidable competitors. Lots of powerful robotics and IT under the thick shields, boasting the ability to transform from one form to another, consuming terawatts of energy, and so on. Most of them would have competed in numerous similar contests. The only "big data" programming I have done in my life was a reformatting of the 80,000 Echo comments on this blog a few years ago, and I didn't really write too many smart programs in the last 25 years, and none of them was in the typical programming languages that contemporary programmers like to use.

But even if one is a programming cripple like that, he is allowed to compete. In this sense, ATLAS and Kaggle are more welcoming than a KFC branch in Missouri that ordered a 3-year-old girl injured by pitbulls to leave the restaurant because she was scaring the other consumers away.

At any rate, the most recent jump above the score of 3.76 was due to the most conservative modification of my code ever. I believe that the general code that yielded the previous personal record of 3.74 became more effective in eliminating various b-labeled tails near the signal-rich regions, and that's why this code prefers a slightly lower percentage of the s-events if you want to get a high enough score. So I lowered it – imagine that I changed a number from 20.0% to 19.7% – and the AMS score jumped by 0.016 or so.

In principle, it's not a big deal.

Let me offer you a few hypotheses about this contest treated as a pure strategy game. I personally believe that no contestant is really trying to save some "guaranteed cool improved program or parameters" for a later moment. The usual explanation that people are "saving their gems for a later time" is that they want their foes to rest on their laurels so that their score won't be too good.

Maybe. But if you manage to post a much better score than everyone else, I think that you discourage the foes even more efficiently. You show them that they are losers and it's a waste of time for them to try to catch up with you! I have surely been repeatedly discouraged by some of the leaders in this way. And I am still intimidated by this guy or babe or a humanoid called nhlx5haze with the nearly 3.79 score at the top. Even his or her or its username is such that many major servers would accept it as a "strong password". (Update: nhlx5haze should be nhl5xhaze and it stands for "Northern Lights #5 vs Haze", seeds for marijuana obtained from breeding of the two strains.)

And yes, on the contrary, if I could, I would like to win or earn a medal by a start-finish method.

It's generally believed that K2, the second highest peak on Earth, is the most difficult one for mountain-climbing. It's nice but as a friend of brute numbers, I would still prefer to conquer Mount Everest.

There is another reason why I think that people are not saving big improvements for a later time. You are never quite sure that it works. Some really high predicted score you get locally may be due to some special properties of the "public" 250,000 training events you may use to estimate your final score. The results might be very different for the 100,000 "public" test events. And it may be quite different in the remaining "private" 450,000 events, too. Someone may think that he may beat 3.8 but is he sure he would be leading? Maybe the method won't work in the real world because it's due to some overfitting that is inevitable for the training dataset. If you don't check whether your "secret new methods or settings" work in the real world and you live from the hopes that it must produce a great score, maybe it is you who is resting on your laurels and who won't be forced to try other things which may turn out to be necessary in the harsh competition.

So I believe that what you see is what the people have right now. It's getting better as people are improving their code. Unfortunately way too quickly.

There is a big question for each competitor: How revolutionary changes should she be trying (are there any girls in the contest? I would like to know) while producing new submissions? This question is remarkably similar to the questions about "courage" in physics.

Many people love to be extreme about the answer. Someone wouldn't allow big deviations at all and he would ban any revolutions (and discoveries) in science as a matter of principle. Others would love to live in a constant Trotskyist scientific revolution where things are dramatically changing all the time and people are producing hypotheses that seem to have almost no relationship to the things that have been established. These folks would love to introduce affirmative action supporting the bold researchers – the fact that by definition, a person who is being spoiled by the system can't be bold, is eluding them. Sometimes, these extremes are proposed by the very same people which is really ironic but it's widespread, too.

In the contest, the first, excessively conservative group would probably get stuck with scores around some not-really-winning values because they wouldn't really improve the essence of their algorithm much. These folks would just be lazily swimming in their stinky Czech carp pond, as our nation usually describes its tendency to stay in the grey average where not too many exciting events are taking place. They could play with the noise which may be completely useless for the final score. The second, excessively radical group, would probably use programs that are so far and randomly separated from the "optimal islands" that they produce random lousy scores, too.

It's very important to have some balance. And by balance, I don't mean just some "medium fluctuations" from your personal records. I mean a mixture of small deviations and large deviations, too. You should have some idea how much you may improve your score by small, relatively controllable modifications of your code. Sometimes these small and controllable changes could be enough. But you should always reserve some attempts – contestants may make 5 submissions a day, so something like 500 submissions in total – for more audacious experiments.

In the Monte Carlo simulations, there is a concept of "temperature" which really works like "temperature" in statistical physics. If the temperature is low, you get stuck in the narrow vicinity of the low-energy microstates of the system, in the statistical physics case – or in a small neighborhood of the code that produced your personal record or other good scores. If the temperature is high, you ignore whether your new attempt is similar to the previous successful submissions. For each situation, there exists some kind of an optimum temperature, some balance between audacious and conservative methods to improve your score.

I still don't think that I can win or I will win the contest but there's no no-go theorem here so I will keep on trying unless the situation looks clearly hopeless at some moment. Balazs Kegl confirmed my rough calculations and told me that the standard deviation of the preliminary score is about\[

\Delta(AMS) \approx 0.08

\] so up to this value or a bit more may be due to good luck. For the final "private" dataset, the standard deviation is about \(0.04\), lower by the factor of \(\sqrt{450,000/100,000}\) or so. However, the organizers are also saying that in reality, most of the huge \(0.08\) noise is due to special properties of the "test" dataset or its "public" part, not so much due to the special properties of the individual contestants' algorithm. So their AMS scores are probably rather strongly correlated and the preliminary leaderboard could be a more accurate representation of the final leaderboard than the huge numbers like \(0.08\) could suggest.

The previous sentence is both good news and bad news for the meritocracy of the contest. It's good news because if it's true, you might think that a smaller part of your chances to win are due to luck. So one may be more motivated to fight for every \(0.01\) in the AMS score because these improvements might be "real" and not just "errors of measurement". On the other hand, I think it is a bad news as well because it shows that the contestants are really using very similar algorithms, with (almost) the same general and "systematic" defects. In this sense, they resemble "contemporary climate models" that err on the same side of the truth.

If the contestants were really diverse and democratically covered "all directions" by which one may deviate from the truth, or from a hypothetical optimum algorithm, then the stochastic part of their scores should really be close to the standard deviation of \(0.08\) which is being mostly attributed to the special features of the test.csv dataset. You could get closer to the truth by averaging their opinions about each event. But because they tend to err in the same direction, you don't gain much knowledge by averaging them, just like in Feynman's parable about the length of the emperor's nose. If no one has seen the nose, you won't get a more accurate knowledge by averaging the opinions of many people!

Add to del.icio.us Digg this Add to reddit

snail feedback (0) :