Friday, May 07, 2010

First digit is most likely one: Benford's law is no mystery

A new Chinese preprint by Shao and Ma, promoted by the arXiv blog, makes the claim that Benford's law remains mysterious.

If you don't know, Benford's law tells us that the probability that the first digit of any real random quantity - such as the price of a stock - is N equals
P(N) = log[(N+1)/N] / log(10).
In particular, the different digits have the following probabilities:

Note that the captions of the English Wikipedia are written in a Slavic language, suggesting that it's more common for the Slavs to understand Benford's law. At any rate, the probability that the first digit equals one exceeds 30% while the probability that the first digit is nine is below 5%.

The Chinese authors repeat a statement that is very widespread:
One may simply presume that occurrence of the first digit of any randomly chosen data set is approximately uniformly distributed, but that is not the very case in real world.
This sentence is deeply misleading but not "fully" untrue - because of the vague word "may" and the undefined word "one" at the beginning. At any rate, a more accurate version of the sentence would say
A very stupid person may simply presume that occurrence of the first digit of any randomly chosen data set is approximately uniformly distributed, but that is not the very case in real world.
Why? Simply because there exists no rational reason to think why all digits should be equally likely. Note that we would have to mean "all digits except for 0" because leading zeros can't be signigicant figures, by definition. This exception we have to give to the number "0" is the first hint why the "naive" uniform distribution is wrong for the other digits, too.

But why would you think that each digit has the probability of 1/9 to be the first digit of a random real number? Well, you could consider "X" which is between 1 and 10 and has a uniform distribution on this linear scale. Clearly, each figure 1...9 is equally likely: the probability is 1/9.

However, unless you're stupid, you must realize that you have cherry-picked the endpoints. You could also consider "X" to be between 1 and 20. If you do so, the whole intervals "(1,2)" and "(10,20)" begin with "1", so the probability of a leading "1" ends up being 11/19, i.e. above 50 percent.

Once you understand the sensitivity, you may guess that the actual probability that the first digit is one is somewhere in between 11% and 50%: for example, it may be slightly above 30%. And you would be right.

Why the log formula is right?

Well, it's because "random" real numbers in the real world may a priori sit anywhere. What I mean is that even the order of magnitude of the numerical values is completely undetermined a priori.

For example, the price of a stock is equally likely to be between 1 and 10 as it is to be between 10 and 100. The two situations only differ by a rescaling of the price by a factor of ten. Very huge prices of stocks may become unlikely - because the stockholders may want to split the stocks, so that one can also sell or buy smaller units. And very tiny prices may become inconvenient, so the people may merge the tiny stocks into bigger ones.

But that only happens for pretty high or very low prices - and there is no preferred point where the stockholders "act". So it's natural to make the approximation that the real numbers such as stock prices are distributed in very long intervals, spanning many orders of magnitude. Adjacent orders of magnitude are equally likely. The distribution looks something like this:

Note that while the quantity is unlikely to be well below 1 or well above 10,000, it is pretty much equally likely to sit in the intervals "(10,100)" and "(100,1000)". That's why it's natural to use the "log(price)" as the x-axis.

Recall my promotion of exponential percentages for similar attitudes.

But once you use "log(price)" as the x-axis, you may see that the probabilistic distribution for "log(price)" is slowly changing within each order of magnitude - so it's nearly constant. If you want to determine the probability that the first digit is e.g. 8, you look at all the blue strips above where the first digit is 8.

Effectively, you compactify the graph - so that the interval "(1,10)" is just reused for all other real numbers. It's not hard to see that the portion
P = ln(9/8) / ln(10/1)
of the interval describes numbers - prices - that start with "8". Such "blue" numbers are much less likely than the "red" numbers that begin with the digit 1.

The ratio of the probabilities that the first digit is 1 or 8 can be seen by a simple argument: because the function on the graph above - the distribution - is slowly changing with "x" on the x-axis, the blue and red areas may be approximately calculated as the product of the height and the width. But the height of a blue area is pretty much equal to the height of a nearby red area. So the areas only differ by the widths, and the ratios of the widths equals "ln(2/1) / ln(9/8)" - which is the ratio of probabilities that the first digit is 1 vs. that it is 8.

As a function of "price", the first digit is a quasi-periodic function of the price. More precisely, it is a periodic function of "log(price)". So the first digit is an "angular variable": prices that differ by the multiplication of 10 or its power are identified.

There is exactly one distribution of an angular variable
log(price) mod log(10)
that is invariant under the multiplication of "price" by any positive constant (i.e. under the choice of "units"), namely the uniform distribution for "log price":
P[log price ∈ (y,y+dy)] = C,   C = 1/log(10).
The constant "C" is chosen so that the integral of the probability distribution over one "fundamental region", e.g. over "(1,10)", is normalized to unity.

You can always use any base of the logarithms in my formulae above - but you must use it consistently.

With this uniform distribution, you can easily see that the distribution is invariant under the change of the units,
price → price / newunit,
i.e. log(price) → log(price) - log(newunit)
simply because the multiplicative rescaling is just an additive shift of the logarithm, and the additive shift doesn't change the uniform distribution of the angular variable.

You can also see why "0" had to be treated differently. The "egalitarian" people who would expect all digits between 1 and 9 to be fairly represented but 0 had to be completely removed because the digit is a far-right denier who doesn't enjoy the rights of the working or middle class (or whatever is the class that the "egalitarian" people want to treat democratically, while sharply suppressing everyone else).

While it's correct that "0" as a price or another real quantity has to be removed from the considerations (and be given a vanishing probability), the reason is actually different. The real reason is that "0" is a far-left digit because "log(0)" equals minus infinity, where all the probability distributions already have to drop to zero. ;-)

At any rate, there's nothing mysterious about Benford's law. It's linked to scale invariance i.e. independence on multiplicative rescaling (or choice of units). And this "symmetry" is easily and fully understood and analyzed if you consider "log(price)" because the multiplicative change become additive shifts which are simpler.

Well-defined distributions

The Chinese authors test several well-defined distributions - such as the Boltzmann-Gibbs classical distribution and its quantum counterparts (Bose-Einstein and Fermi-Dirac). Not surprisingly, they find out that the distribution of first digits "fluctuates around" the universal Benford values.

What's more interesting is that the Bose-Einstein distribution, regardless of the only important parameter, the temperature (and its "first digit"), always reproduces Benford's law exactly. It's pretty interesting that one type of the quantum particles - bosons - has this property while the other (and the classical result) doesn't.

I haven't even checked the statement.

But because writing of real numbers using digits doesn't seem terribly fundamental to me and the Bose-Einstein accident is just one particular property (constancy) of a function summed over the "decades", I don't expect the finding to be more fundamental than that, either. :-)

Update: no coincidence

I see, there's no interesting identity behind the Bose-Einstein agreement. It works exactly simply because the Bose-Einstein distribution isn't normalizable. The relevant integral over "E" of "1/(exp(b.E)-1)" logarithmically diverges near "E=0", so one must use a non-normalizable distribution and uniformly cover infinitely many decades, just like in the idealized derivation of Benford's law.

The triviality of the exact agreement - because of the log divergence - makes it even more surprising why they think that they have found something deep.


  1. A mathematician colleague of mine
    many years ago
    expressed puzzlement over the predominance of 1's
    as the leading digit of randomly chosen numbers.

    My explanation to him was:
    "Look at the face of a slide rule!"

    Of course they don't make slide rules any more
    so I guess that argument no longer applies.

  2. I read your blog regularly. Thanks for it. I'll say hello to Steve McIntyre for you (Climate Audit). He lives just down the street, and I know he's a fan of yours.

    I'm not a mathematician nor academic.

    A few weeks ago I decided to sink my teeth into Benford's Law because I was always slightly disturbed that random numbers generated by a software generator did not Benford, whereas 'natural' datasets of random things did. So I tried to explain the differences between the two kinds of random.

    I don't know how successful I was, but it sure removed any mystery I once felt for the subject. I wrote it up here

    There are a lot of 'ifs' here. If you actually read comments to old posts, and if you like the writeup, and if you think there might be some merit to publishing a link to it (from I don't know where, leave it to you), I'd very much appreciate it. I provide a link to your blog.

    Thanks for the interesting posts and cheers!

    Doug Bennion

  3. Dear Doug, the world is a small place! I am Steve's fan, too - not only the Steve as a bright independent researcher but maybe also as a squash player. Squash is hard for me. ;-)

    Please, send him my best regards if you can. In person, that's quite touching.

    One needs to be careful to produce Benford-distributed numbers "artificially". Clearly, if you study numbers exp(X) where X is a random integer chosen uniformly from a large enough interval, e.g. 0-50, then exp(X) will satisfy Benford's law.

    Your webpage about the problem is nicely written and the domain name is impressively relevant, indeed.

    All the best

  4. Thanks very much Lubos.

    I found in my testing, that any product R1 x R2 x R3 say, with R = rnd * 100 say, Benfords pretty nicely. Your suggestion R = e ^ (rnd * 50) does as well, as does (any number) ^ (rnd * even a small number) because they are essentially self-multiplication products I guess.

    Next time I run into Steve, I'll pass on your hello. He was limping a bit a few days ago, no doubt from his recent matches :-)

    Thanks again! cheers!

  5. Sorry, the "Benford law" thing is contentless, because it implies no other, nor makes use of no other, features of a "random" sequence of integers 1 through 9, such as the average of a sum of sequence of integers approaching 4.5 (the average of the inegers 0 through 9).

    Meaning one could take the word "random" right out of it, and still have the same result if there were no a priori mention of a method of determining the terms of the sequence of integers

  6. Dear Brian, you're wrong. Benford's law assumes and implies no other patterns because no other patterns exist, and your claim that the average digit is 4.5 is just wrong is one considers ensembles of digits where the first digits constitute a sizable fraction of the digits.

    Just try to list the current prices of 1000 stocks and make the statistics how many of them start with 1, 2... or 9, you will see that Benford's law will hold and smaller digits such as 1 are much more frequent than larger ones such as 9.


  7. Hi Brian. Here is a list of the 300 brightest stars. Their distances from earth would be 'random'. Simply count the leading '1's in the distance column "dist ly", and you will arrive at approximately 30%. Or easier, count the leading '9's and arrive at about 5%.

  8. Hi Brian. Here is a list of the 300 brightest stars. Their distances from earth would be 'random'. Simply count the leading '1's in the distance column "dist ly", and you will arrive at approximately 30%. Or easier, count the leading '9's and arrive at about 5%.

  9. "Their distances from earth would be 'random'."

    Um, not quite. Their distances have been calculated using their brightness, which is a log intensity scale. Numbers appearing on that scale (from 1 to 10) would be proportional to the log of the distance

  10. Dear Brian, the purpose of your comment is completely impenetrable to me. Do you actually disagree with something we're saying? If you do, it is not clear what it is.

    The magnitude is defined as a log of the distance - and what? The distance is random and the magnitude is random, too - because it's the same information.

    The leading digit of the distance does follow Benford's law because the possible values span a large number of orders of magnitude. The magnitude of the star doesn't obey Benford's law exactly because the magnitude is defined as a log in order to make all the possible values of the magnitude "comparable", so most stars are peaked "somewhere" on the log scale.


  11. No, I'm not disagreeing necessarily.

    What I mean is, suppose I have a circular scale, as on the dial of a meter, calibrated to the log of the distance on that scale, from 1 to 10.

    Then the probability of finding a number on that scale from 1 to 2 is 0.30 of that distance, from 1 to 3 is 0.48 of that distance, etc.

    If this "dial" was used together with an instrument to measure the brightness of a star, say by extinction correlated with intensity of light, then (it seems to me that) not all brightness measurements made with this instrument would have equal probability of being made (even though the brightnesses or distances themselves would be random or equally probable).

    So I guess this would be a systematic introduction of a bias related to this particular physical method of measurement itself. Hope this isn't too obtuse.