Book Search

Download this chapter in PDF format

Chapter34.pdf

Table of contents

How to order your own hardcover copy

Wouldn't you rather have a bound book instead of 640 loose pages?
Your laser printer will thank you!
Order from Amazon.com.

Chapter 34: Explaining Benford's Law

Analysis of the Log-Normal Distribution

We have looked at two log-normal distributions, one having a standard deviation of 0.25 and the other a standard deviation of 0.5. Surprisingly, one follows Benford's law extremely well, while the other does not follow it at all. In this section we will examine the analytical transition between these two behaviors for this particular distribution.

As shown in Fig. 34-5d, we can use the value of OST(1) as a measure of how well Benford's law is followed. Our goal is to derive an equation relating the standard deviation of psf(g) with the value of OST(1), that is, relating the width of the distribution with its compliance with Benford's law. Notice that this has rigorously defined the problem (removed the fuzziness) by specifying three things, the shape of the distribution, how we are measuring compliance with Benford's law, and how we are defining the distribution width.

The next step is to write the equation for PSF(f), a one-sided Gaussian curve, having a value of zero at f=0, and a standard deviation of σf:

Next we plug in the conversion from the logarithmic-domain standard deviation, σf = 1/(2πσg), and evaluate the expression at f=1:

Lastly, we use OST(1) = SF(1) × PSF(1), where SF(1) = 0.516, to reach the final equation:

As illustrated in Fig. 34-5c, the highest value in ost(g) is OST(1) plus 0.301, and the lowest value is 0.301 - OST(1). These highest and lowest values are graphed in Fig. 34-8a. As shown, when the 2σ width of the distribution is 0.5 (as in Fig 34-5a), the Ones Scaling Test will have values as high as 45% and as low as 16%, a very poor match to Benford's law. However, doubling the width to 2σ = 1.0 results in a high to low fluctuation of less than 1%, a good match.

There are a number of interesting details in this example. First, notice how rapidly the transition occurs between following and not following Benford's law. For instance, two cases are indicated by A and B in Fig. 34-8, with 2σ = 0.60 and 2σ = 0.90, respectively. In Fig. (b) these are shown on the linear scale. Now imagine that you are a researcher trying to understand Benford's law, before reading this chapter. Even though these two distributions appear very similar, one follows Benford's law very well, and the other doesn't follow it at all! This gives you an idea of the frustration Benford's law has produced.

Second, even though the curves in Fig. (a) move together extremely rapidly, they never actually meet (excluding infinity which isn't allowed for a pdf). For instance, from Eq. 34-5 a log-normal distribution with a standard deviation of three will follow Benford's law within about 1 part in 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000. That's pretty close! In fact, you could not statistically detect this error even with a billion computers, each generating a billion numbers each second, since the beginning of the universe.

Nevertheless, this is a finite error, and has caused frustration of its own. Again imagine that you are a researcher trying to understand Benford's law. You proceed by writing down some equation describing when Benford's law will be followed, and then you solve it. The answer you find is– Never! There is no distribution (excluding the oscillatory case of Fig. 34-6b) that follows Benford's law exactly. An equation doesn't give you what is close, only what is equal. In other words, you find no understanding, just more mystery.

Lastly, the log-normal distribution is more than just an example, it is an important case where Benford's law arises in Nature. The reason for this is one of the most powerful driving forces in statistics, the Central Limit Theorem (CLT). As discussed in chapter 2, the CLT describes that adding many random numbers produces a normal distribution. This accounts for the normal distribution being so commonly observed in science and engineering. However, if a group of random numbers are multiplied, the result will be a normal distribution on the logarithmic scale. Accordingly, the log-normal distribution is also commonly found in Nature. This is probably the single most important reason that some distributions are found to follow Benford's law while others do not. Normal distributions are not wide enough to follow the law. On the other hand, broad log-normal distributions follow it to a very high degree.

Want to generate numbers that follow Benford's law for your own experiments? You can take advantage of the CLT. Most computer languages have a random number generator that produces values uniformly distributed between 0 and 1. Call this function multiple times and multiply the numbers. It can be shown that PDF(1) = 0.344 for the uniform distribution, and therefore the product of these numbers follows Benford's law according to OST(1) = 51.6% × 0.344α, where α is how many random numbers are multiplied. For instance, ten multiplications produce a random number that comes from a log-normal distribution with a standard deviation of approximately 0.75. This corresponds to OST(1) = 0.0012%, a very good fit to Benford's law.

If you do try some of these experiments, remember that the statistical variation (noise) on N random events is about SQRT(N). For instance, suppose you generate 1 million numbers in your computer and count how many have 1 as the leading digit. If Benford's law is being followed, this number will be about 301,000. However, when you repeat the experiment several times you find this changes randomly by about 1,000 numbers, since SQRT(1,000,000) = 1,000. In other words, using 1 million numbers allows you to conclude that the percentage of numbers with one as the leading digit is about 30.1% +/- 0.1%. As another example, the ripple in Fig. 34-3a is a result of using 14,414 samples. For a more precise measurement you need more numbers, and it grows very quickly. For instance, to detect the error of OST(1) = 0.0012% (the above example), you will need in excess of a billion numbers.

Next Section: The Power of Signal Processing