Appendix AC. The Law of Large Numbers

The Law of Large Numbers is a theorem that describes large collections of numbers or observations that are subject to independent and identically distributed random variation, such as the result of performing the same measurement a large number of times. The average of the results obtained from a large number of trials should be close to the actual long-term value, and will tend to become closer as more trials are performed. It is an important idea because it guarantees stable long-term results for the averages of some random events. This is why gambling casinos are able to make money; their games are designed to give the casino a small advantage in the long run but highly variable results in the short term, guaranteeing plenty of (noisy) winners, which encourages the gamblers, but even a greater number of (usually quiet) losers. And that is why investors in the stock market often make money in the long run, despite the unpredictable day-to-day variation, up one day and down the next, and why it is so hard to see climate change in the much wilder short-term hot and cold day-to-day and year-to-year swings in the weather. Short term is closer and easier; long term is harder to see from here.


But "The average ... will tend to become closer as more trials are performed" does not mean that the average becomes steadily and irreversibly closer. In fact, the average can wander around quite a bit. Take the example above, which shows the running average of a set of normally distributed independent random numbers with a population mean of 1.000 and a standard deviation of 1.000, as more and more numbers from that population are averaged, up to 1000. (This is generated by the Matlab script RunningAverage.m, shown on the left). Note that the average wanders around, reaching and crossing over the true population average twice in this case before ending up near 1.0 after 1000 points are accumulated. But if you ran this script again, the final average may not be so close to 1.0. In fact, the predicted standard deviation of the average of 1000 random numbers is reduced by a factor of 1/sqrt(1000), which is about 0.031, or 3% relative, meaning that most results will fall within 6% of the true average of 1.000, that is, between 0.94 and 1.06.

The uncertainty of uncertainty. The situation is even worse if you wish to estimate the standard deviation of a population from small samples. The Matlab script RunningStandardDeviation.m simulates this for the same population in the previous example.


As shown in the graph above, the sample standard deviation wanders around alarmingly for small samples and only settles down slowly. Even worse, the standard deviation for very small samples is biased down, often returning values far lower than the population standard deviation.

There is a well-documented tendency for people to overestimate the quality of small numbers of observations, sometimes referred to as hasty generalization, or insensitivity to sample size, or the gambler's fallacy. This is related to the field of study of a famous pair of psychologists named Amos Tversky and Daniel Kahneman, who collaborated in a long-running study of human cognitive biases in the 1970s. They formulated a hypothesis that people tend to believe in a false "Law of Small Numbers", the name they coined for the mistaken belief that a small sample drawn from a large population is representative of that large population. We would like to believe that scientists are immune to these foibles and that they always think logically and correctly. But scientists are only human, so it is important to be aware of this tendency, particularly when a small sample of data supports your favorite hypothesis. It is tempting to stop there, "while you are ahead". This is called "confirmation bias". Don't do it.

Of course in many practical experimental measurements, you may really be constrained to a rather small number of repeated measurements. There may be a fixed number of data points and no possibility of gathering more. Or the cost, in money or in time, of gathering more data may be excessive, even in a laboratory environment. For example, the process of calibrating an analytical instrument for quantitative measurement may involve the preparation and measurement of several standard samples or solutions of known composition. If the calibration curve (the relationship between instrument reading and sample composition) is non-linear, it takes several different standards to define the curve. You have to consider not only cost of preparing many standards but also the cost of cleaning up and safely storing or disposing of the (potentially hazardous) chemicals afterwards. The bottom line is, if you are limited to a small number of data points, do not over-represent the precision of your results. To use the 3-sigma rule to determine uncertainty ranges for a set of data, the distribution must be normal (Gaussian) and you need to know the standard deviation. The problem is that, for small sets of data, both are uncertain.


This page is part of "A Pragmatic Introduction to Signal Processing", created and maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu. Updated July, 2022.