Benford’s Law
September 14, 2006 – 8:51 pmBenford discovered that in many sets of data, the leading digit is much more likely to be ‘1′ than any other digit. Take, for example, the population counts of nations. The most significat digit probabilities there are as follows:
| MSD | Probability |
|---|---|
| 1 | 0.26 |
| 2 | 0.20 |
| 3 | 0.10 |
| 4 | 0.13 |
| 5 | 0.07 |
| 6 | 0.08 |
| 7 | 0.07 |
| 8 | 0.05 |
| 9 | 0.04 |
The MathWorld article offers an explanation in terms of distributions that are invariant under changes of the measurement unit. I have to say, I wasn’t entirely convinced by that explanation. I’d like to offer a different theory that might account for this phenomenon.
Suppose X is a real random variable with a distribution P(X). For example, X may represent the population of a nation. Further suppose that P(X) is monotonically decreasing in X, and that X ranges from 1 to +infinity. Now look at the most-significant digit of X. I claim that, in this very common situation, ‘1′ will be more likely than the rest of the digits. Further, ‘1′ will be more likely than ‘2′, which will be more likely than ‘3′, and so on.
To prove this claim, let’s consider different orders of 10. With 10^0, i.e. the range 1 through 10, what is the likelihood that 1 is the MSD (most significant digit)? That’s the likelihood that X is between 1 and 2, or P([1,2)). Similarly, the likelihood that digit 'k' is the MSD is P([k,k+1)). Because P(X) is monotonically decreasing, we have P([1,2)) > P([2,3)) > ... > P([9,10)).
Now let's look at 10^1. In the same manner, the range [10, 20) is more likely than [20, 30), and so on. And the same is true for any order of 10. Now, what is the probability of getting 1 as an MSD when we select X at random? That's the sum over all orders of 10:
And the probability of getting 2 as the MSD? That’s
.
We already showed that, for any k,
, hence
. In the same manner,
and so on. QED
So ‘1′ is more likely than ‘2′, but can this account for the difference we see in Benford’s law? (There, ‘1′ is more likely than ‘2′ by a factor of about 2). It’s hard to say, because we don’t have the probability distributions for the datasets he used. But we can look at another, well-known distribution — the power law distribution:
.
This distribution occurs in many natural phenomena, including, for example, the sizes of earthqukes. It is also a monotonically decreasing distribution, so it fits the bill. Let’s calculate the likelihood of getting an MSD of ‘m’:


Now let’s plug in some numbers. Let’s take
and calculate the probability ratio between an MSD of ‘1′ and ‘2′:

Which is even greater than the ratio of about 2 as appears in mathworld.
As a final note, using Zipf’s law I checked a couple of datasets on the CIA fact book, and they do not distribute according to a power law. So what I’ve shown isn’t a direct explanation of Benford’s law for these datasets. But it does explain why Benford’s law appears in many naturally-occuring phenomena.
Subscribe by email