Tuesday, February 23, 2010

Microsoft browser lottery: a random algorithm that is not so random

This is an amusing story that involves some simple mathematics, so let me tell you about it.

TechCrunch and Neowin have promoted a story originally invented by a Slovak daily about the Internet, DSL.SK (orig in SK). What is the problem?

Microsoft will soon be forced to accept the rules of the EU capitalism. Unlike the conventional capitalism where the companies want to defeat their competitors, the EU capitalism requires companies to help others to defeat themselves. It turns out that this is exactly what Microsoft has done. ;-)

What did they do? Since March, the Windows 7 users in the EU will be offered the ballot above. You may choose one of the five browsers - MSIE, Firefox, Chrome, Safari, and Opera. The competitors have insisted that the ordering must be random.

On the website BrowserChoice.EU where you can test this random ballot, you also find the JavaScript code that produces the random ordering: look at the bottom of it. It's likely that the same JavaScript code, run on Microsoft Internet Explorer, will actually decide about the ordering seen on the displays.

The JavaScript is simple and, at the beginning, it also looked fair: I thought that it produced five real random numbers (using Math.random) uniformly distributed in the interval (-0.5, +0.5), associated them with the five browsers, and sorted them according to the real numbers. Clearly, by the S_5 symmetry of the distribution, each of the 5! = 120 permutations would be equally likely.

So each of the browsers should end up being 1st, 2nd, 3rd, 4th, 5th in 20% of the cases. The average rank should be 3.0 for each browser. To be really sure, I have run a simple Mathematica code 10,000 times and it is indeed the case. Mathematica works. The deviations from 20% are less than 1%.

However, the DSL.SK folks claim that after 10,000 runs or so, the numbers that should be uniformly 20% in the five-by-five table actually go from 6% to 50%. They're heavily non-uniform. And Microsoft Internet Explorer actually punishes itself. MSIE ends on the fifth place in a whopping 50.1% of the cases.

If you ask Firefox to evaluate the same problem, the results are completely different, they claim. But they don't matter because the ballot will only run on MSIE. :-)

Because the algorithm should manifestly produce the 20% percentages for every browser, assuming that the random numbers are random and that they use my algorithm, there would have to be a problem with the "Math.random" generators incorporated in the JavaScript of various browsers. There may be a correlation between the consecutive random numbers. For example, the random numbers in (0,1) must have the tendency to increase in many steps, and then drop quickly.

But even if this is the case, there would have to be a periodicity of five randomly generated numbers "incorporated" into the flawed pseudorandom generator because even the bias mentioned in the previous paragraph wouldn't be enough to assign the MSIE with the fifth place so often.

Needless to say, even if the distribution of the orderings is non-uniform, it doesn't matter in "reality" because the users are likely to take the names of the browsers into account, but even if they ignored the names, they're unlikely to universally pick the first browser - they're more likely to add another random generator to the process which would lead to another layer of randomization, anyway. ;-)

Update I

Holy crap, I am looking at the JavaScript code more carefully and there's some complicated rubbish instead of my simple algorithm above. So I am no longer sure whether they actually have a fair algorithm. Or is this extra crap only activated if the input contains more than 5 browsers? Why the hell is the order shifting twice every time you click "reload"?

Update II

Holy crap. I finally understood what it means to insert an argument into the array.sort(argument) function and their algorithm is really wrong. I thought that they would first produce the five random numbers associated with the browsers, and then sorted them - which would be fair.

But they actually produce a new random number for each browser every time a pair of browsers is being compared, when they're automatically sorted by the sort function! So it really depends on the internal mechanism how the "sort" function works in various implementations of the JavaScript (see a review of different sorting algorithms) - how it organizes the one-by-one comparisons - and the results are bound to be non-uniform because e.g. the first browser is more likely to be compared many times.

It's unbelievable that they screw such a simple thing as a random ordering of five entries. :-) It's so simple to do it right! So after an hour of research into this issue, I am absolutely certain that the distributions generated by the algorithm are non-uniform and browser-dependent, indeed. Here is my simple Mathematica code that produces all the 120 orderings equally often:
repetitions = 10000;
frequencies = Table[0, {i, 5}, {j, 5}];

For[i = 1, i <= repetitions, i++,
a = RandomReal[{0, 1}, 5];
b = Ordering[a];
For[ii = 1, ii <= 5, ii++,
frequencies[[ii, b[[ii]] ]] = frequencies[[ii, b[[ii]] ]] + 1;]]

rel = N[100*frequencies/repetitions];
rel // MatrixForm
The result is:
{20.68, 19.65, 20.51, 19.55, 19.61},
{19.94, 20.26, 19.81, 20.22, 19.77},
{19.84, 20.45, 19.68, 19.28, 20.75},
{19.57, 19.83, 20.28, 20.59, 19.73},
{19.97, 19.81, 19.72, 20.36, 20.14}

No comments:

Post a Comment