 
      
     
 The
            Law of Large Numbers is
          a theorem that describes large collections of numbers or
          observations that are subject to independent and identically
          distributed random variation, such as the result of performing
          the same measurement a large number of times. The average of
          the results obtained from a large number of trials should be
          close to the actual long-term value, and will tend to become
          closer as more trials are performed. It is an important idea
          because it guarantees stable long-term results for the
          averages of some random events. This is why gambling casinos
          are able to make money; their games are designed to give the
          casino a small advantage in the long run but highly variable
          results in the short term, guaranteeing plenty of (noisy)
          winners, which encourages the gamblers, but even a greater
          number of (usually quiet) losers. And that is why investors in
          the stock market often make money in the long run, despite the
          unpredictable day-to-day variation, up one day and down the
          next, and why it is so hard to see climate change in
          the much wilder short-term hot and cold day-to-day and
          year-to-year swings in the weather. Short term
          is closer and easier; long term is harder to see from
          here.
But "The average ... will tend to become
          closer as more trials are performed" does not mean that the
          average becomes steadily
            and irreversibly closer. In fact, the average can wander
          around quite a bit. Take the example above, which shows the running
            average of a set of normally distributed independent
          random numbers with a population mean of 1.000 and a standard
          deviation of 1.000, as more and more numbers from that
          population are averaged, up to 1000. (This is generated by the
          Matlab script RunningAverage.m, shown on the left). Note that the
          average wanders around, reaching and crossing over the true
          population average twice in this case before ending up near
          1.0 after 1000 points are accumulated. But if you ran this
          script again, the final average may not be so close to
          1.0. In fact, the predicted standard deviation of the average
          of 1000 random numbers is reduced by a factor of 1/sqrt(1000),
          which is about 0.031, or 3% relative, meaning that most
            results will fall within 6% of the true average of
          1.000, that is, between 0.94 and 1.06. 
The uncertainty of uncertainty.
          The situation is even worse if you wish to estimate the standard
            deviation of a population from small samples. The Matlab
          script RunningStandardDeviation.m simulates this for the same population in
          the 
        
As shown in the graph above, the sample
          standard deviation wanders around alarmingly for small samples
          and only settles down slowly. Even worse, the standard
          deviation for very small samples is biased down, often returning values far lower than
          the population standard deviation.
There is a well-documented tendency for
          people to overestimate
          the quality of small numbers of observations, sometimes
          referred to as hasty
            generalization, or insensitivity
to
            sample size, or the gambler's
            fallacy. This is
          related to the field of study of a famous pair of
          psychologists named Amos Tversky and Daniel Kahneman, who
          collaborated in a long-running study of human cognitive biases
          in the 1970s. They formulated a hypothesis that people tend to
          believe in a false "Law
of
            Small Numbers", the
          name they coined for the mistaken belief that a small sample
          drawn from a large population is representative of that large
          population. We would like to believe that scientists are
          immune to these foibles and that they always think logically
          and correctly. But scientists are only human, so it is
          important to be aware of this tendency, particularly when a
          small sample of data supports your favorite hypothesis. It is
          tempting to stop there, "while you are ahead". This is called
          "confirmation
            bias". Don't do it.
        
Of course in many practical experimental
          measurements, you may really be constrained to a rather small
          number of repeated measurements. There may be a fixed number
          of data points and no possibility of gathering more. Or the
          cost, in money or in time, of gathering more data may be
          excessive, even in a laboratory environment. For example, the
          process of calibrating an analytical instrument
            for quantitative measurement may involve the preparation
          and measurement of several standard samples or solutions of
          known composition. If the calibration curve (the relationship
          between instrument reading and sample composition) is
          non-linear, it takes several different standards to define the
          curve. You have to consider not only cost of preparing many
          standards but also the cost of cleaning up and safely storing
          or disposing of the (potentially hazardous) chemicals
          afterwards. The bottom line is, if you are limited to a small
          number of data points, do not over-represent the precision of
          your results. To use the 3-sigma
            rule to determine uncertainty ranges for a set of data,
          the distribution must be normal (Gaussian) and you need to
          know the standard deviation. The problem is that, for small
          sets of data, both are uncertain. 
              
      This page is part of "A Pragmatic Introduction to Signal
          Processing", created and maintained by Prof. Tom O'Haver ,
        Department of Chemistry and Biochemistry, The University of
        Maryland at College Park. Comments, suggestions and questions
        should be directed to Prof. O'Haver at toh@umd.edu. Updated July, 2022.