[Sources]
[Measuring noise] [Signal-to-noise ratio] [Ensemble averaging] [Frequency spectrum] [Dependence on signal amplitude]
[Probability distribution] [Spreadsheets] [Matlab/Octave] [Difference between scripts and
functions] [Interactive
tools] [French
translation of this page] [Other examples]

Experimental measurements are never perfect, even with
sophisticated modern instruments. Two main types or measurement
errors are recognized: (a) The term "signal" actually has two meanings: in the more general sense, it can mean the

If you are lucky enough to have a sample and an instrument that are completely stable (

But what if the measurements are not that reproducible or that you had only one recording of that spectrum and no other data? In that case, you could try to estimate the noise in that single recording, based on the

It's important to appreciate that the standard deviations calculated of a small set of measurements can be much higher or much lower than the actual standard deviation of a larger number of measurements. For example, the Matlab/Octave function

A quick but approximate way to estimate the amplitude of noise visually is the

In addition to the

The
quality of a signal is
often expressed quantitatively as the signal-to-noise* ratio*
(S/N ratio), which is the ratio of the true underlying signal
amplitude (e.g. the average amplitude or the peak height) to the
standard deviation of the noise. Thus the S/N ratio of the
spectrum in Figure 1 is about 0.08/0.001 = 80, and the signal in
Figure 3 has a S/N ratio of 1.0/0.2 = 5. So we would
say that the quality of the signal in Figure 1 is better than
that in Figure 3 because it has a greater S/N ratio.
Measuring the S/N ratio is much easier if the noise can be
measured separately, in the absence of signal. Depending on the
type of experiment, it may be possible to acquire readings of
the noise alone, for example on a segment of the baseline before
or after the occurrence of the signal. However, if the magnitude
of the noise depends on the level of the signal, then the
experimenter must try to produce a constant signal level to
allow measurement of the noise on the signal. In some cases,
where you can model the shape of the signal accurately by means
of a mathematical function (such as a polynomial or the weighted sum
of a number of peak shape
functions), the noise may be isolated by subtracting the model
from the unsmoothed experimental signal, for example by looking
at the residuals in least-squares curve fitting, as in this example. If
possible, it's usually better to determine the standard
deviation of repeated measurements of the thing that you want to
measure (e.g. the peak heights or areas), rather than trying to
estimate the noise from a single recording of the data.

One key
thing that really distinguishes signal from noise is that random
noise is not the same from one measurement of the signal to the
next, whereas the genuine signal is at least partially
reproducible. So if the signal can be measured more than once,
use can be made of this fact by measuring the signal over and
over again, as fast as is practical, and adding up all the
measurements point-by-point, then dividing by the number of
signals averaged. This is called *ensemble averaging*, and
it is one of the most powerful methods for improving signals,
when it can be applied. For this to work properly, the noise
must be random and the signal must occur at the same time in
each repeat. An example is shown in the figure below.

*Window 1 (left) is a single measurement of a very noisy
signal. There is actually a broad peak near the center of this
signal, but it is difficult to measure its position, width,
and height accurately because the S/N ratio is very poor.
Window 2 (right) is the average of 9 repeated measurements of
this signal, clearly showing the peak emerging from the noise.
The expected improvement in S/N ratio is 3 (the square root of
9). Often it is possible to average hundreds
of measurements, resulting in much more substantial
improvement. The S/N ratio in the resulting average signal in
this example is about 5. *

Noise that has a more low-frequency-weighted character, that is, that has more power at low frequencies than at high frequencies, is often called "pink noise". In the acoustical domain, pink noise sounds more like a

Conversely, noise that has more power at high frequencies is called “blue” noise. This type of noise is less commonly encountered in experimental work, but it can occur in processed signals that have been subject to some sort of differentiation process or that have been deconvoluted from some blurring process. Blue noise is

Often, there is a mix of noises with different behaviors; in optical spectroscopy, three fundamental types of noise are recognized, based on their origin and on how they vary with light intensity: photon noise, detector noise, and flicker (fluctuation) noise. Photon noise (often the limiting noise in instruments that use photo-multiplier detectors) is white and is proportional to the

Only in a very few special cases is it possible to eliminate noise completely, so usually you must be satisfied by increasing the S/N ratio as much as possible. The key in any experimental system is to understand the possible sources of noise, break down the system into its parts and measure the noise generated by each part separately, then seek to reduce or compensate for as much of each noise source as possible. For example, in optical spectroscopy, source flicker noise can often be reduced or eliminated by using in feedback stabilization, choosing a better light source, using an internal standard, or specialized instrument designs such as double-beam, dual wavelength, derivative, and wavelength modulation. The effect of photon noise and detector noise can be reduced by increasing the light intensity at the detector or increasing the spectrometer slit width, and electronics noise can sometimes be reduced by cooling or upgrading the detector and/or electronics. Fixed pattern noise in array detectors can be corrected in software. Only

This is easily demonstrated by a little simulation. In the example on the left, we start with a set of 100,000 uniformly distributed random numbers that have an equal chance of having any value between certain limits - between 0 and +1 in this case (like the "rand" function in most spreadsheets and Matlab/Octave). The graph in the upper left of the figure shows the probability distribution, called a “histogram”,

Remarkably, the distributions of the individual events hardly matter at all. You could modify the individual distributions in this simulation by including additional functions, such as sqrt(rand), sin(rand), rand^2, log(rand), etc, to obtain other radically non-normal individual distributions. It seems that no matter what the distribution of the single random variable might be, by the time you combine even as few as four of them, the resulting distribution is already visually close to normal. Real world macroscopic observations are often the result of thousands or millions of individual microscopic events, so whatever the probability distributions of the individual events, the combined macroscopic observations approach a normal distribution essentially perfectly. It is on this common adherence to normal distributions that the common statistical procedures are based; the use of the mean, standard deviation σ , least-squares fits, confidence limits, etc, are all based on the assumption of a normal distribution. Even so, experimental errors and noise are not always normal; sometimes there are very large errors that fall well beyond the “normal” range. They are called “outliers” and they can have a very large effect on the standard deviation σ . In such cases it's common to use the “interquartile range” (IQR), defined as the difference between the upper and lower quartiles, instead of the standard deviation, because the interquartile range is not effected by a few outliers. For a normal distribution, the interquartile range is equal to 1.34896 times the standard deviation. A quick way to check the distribution of a large set of random numbers is to compute both the standard deviation and the interquartile range; if they are roughly equal, the distribution is probably normal; if the standard deviation is much larger, the data set probably contains outliers and the standard deviation without the outliers can be better estimated by dividing the interquartile range by 1.34896.

It important to understand that the three characteristics of noise just discussed in the paragraphs above - the frequency distribution, the amplitude distribution, and the signal dependence - are mutually independent; a noise may in principle have any combination of those properties.

**Visual
animation of ensemble averaging.** This 17-second video (EnsembleAverage1.wmv)
demonstrates the ensemble averaging of 1000 repeats of a signal
with a very poor S/N ratio. The signal itself consists of three
peaks located at x = 50, 100, and 150, with peak heights 1, 2,
and 3 units. These signal peaks are buried in random noise whose
standard deviation is 10. Thus the S/N ratio of the smallest
peaks is 0.1, which is far too low to even see a signal, much less
measure it. The video shows the accumulating average signal as
1000 measurements of the signal are performed. At the end, the
noise is reduced (on average) by the square root of 1000 (about
32), so that the S/N ratio of the smallest peaks ends up being
about 3, just enough to detect the presence of a peak reliably.
Click here to download
the video (2 MBytes) in WMV format. (This demonstration was
created in Matlab 6.5. If you have access to that software, you
may download the original m-file, EnsembleAverage.zip).

SPECTRUM, the Macintosh freeware signal-processing application that accompanies this tutorial, includes several functions for measuring signals and noise in the

Popular spreadsheets, such as Excel or Open Office Calc, have built-in functions that can be used for calculating, measuring and plotting signals and noise. For example, the cell formula for one point on a

Most spreadsheets have only a

The interquartile range (IQR) can be calculated in a spreadsheet by subtracting the third quartile from the first (e.g.

Matlab and Octave have built-in functions that can be used for for calculating, measuring and plotting signals and noise, including mean, max, min, std, kurtosis, skewness, plot, hist, histfit, rand, and randn. Just type "help" and the function name at the command >> prompt, e.g. "help mean". Most of these functions apply to vectors and matrices as well as scalar variables. For example, if you have a series of results in a vector variable 'y',

As an example of the "randn" function in Matlab/Octave, it is used here to generate 100 normally-distributed random numbers, then the "hist" function computes the "histogram" (probability distribution) of those random numbers, then the downloadable function peakfit.m fits a Gaussian function (plotted with a red line) to that distribution:

>> peakfit([X;N])

If you change the 100 to 1000 or a higher number, the distribution becomes closer and closer to a perfect Gaussian and its peak falls closer to 0.00. The "randn" function is useful in signal processing for predicting the uncertainty of measurements in the presence of random noise, for example by using the Monte Carlo or the bootstrap methods that will be described in a later section. (You can copy and paste, or drag and drop, these two lines of code into the Matlab or Octave editor or into the command line and press

Here is an MP4 animation that demonstrates the gradual emergence of a Gaussian normal distribution and the number of samples increase from 2 to 1000. Note how many samples it takes before the normal distribution is well-formed.

**The difference
between scripts and functions**. You can also create
your
own user-defined scripts and functions in Matlab or Octave
to automate commonly-used algorithms. Scripts and functions are
simple text files saved with a ".m" file extension to the file
name. The difference between a script and a function is that a
function definition begins with the word 'function'; a script is
just any list of Matlab commands and statements. For a *script*,
all the variables defined and used are listed in the workspace
window. For a *function*, on the other hand, the variables
are *internal and private to that function*; values can be
passed *to *the function through the *input *arguments,
and values can be passed *from *the function through the
*output *arguments, which are both defined in the first
line of the function definition. That means that functions are a
great way to package chucks of code that perform useful
operations in a form that can be used as components in other
program *without worrying that the variable names in the
function will conflict and cause errors*. Scripts and
functions can call other functions; scripts
must have those functions in the Matlab path; functions, on
the other hand, *can have all their required
sub-functions defined within the main function itself and
thus can be self-contained*. (If you run
one of my scripts and get an error message that says "`Undefined
function...`", you need to download the specified function
from functions.html
and place it in the Matlab/Octave path). Note:
in Matlab R2016b or later, you CAN include functions within
scripts (see https://www.mathworks.com/help/matlab/matlab_prog/local-functions-in-scripts.html).

For writing or editing scripts and
functions, Matlab and the latest version of Octave have an
internal editor. For an explanation of a function and a simple
worked example, type “help function” at the command prompt. When
you are writing your own functions or scripts, you should always
adds lots of "comment lines", beginning with the character %,
that explain what is going on. *You'll be glad you did later*.
The first group of comment lines, up to the first blank line
that does not begin with a %, are considered to be the "help
file" for that script or function. Typing "help NAME" displays
those comment lines for the function or script NAME in the
command window, just as it does for the built-in functions and
scripts. This will make your scripts and functions
much easier to understand and use, both by other people and by
yourself in the future. Resist the temptation to skip this.

Here's a very handy helper: when you type a
function name into the Matlab editor, if you *pause for a
moment* after typing the open parenthesis immediately after
the function name, Matlab will display a pop-up listing all the
possible input arguments as a reminder. *This works even for
downloaded functions and for any new functions that you
yourself create*. It's especially handy when there are so
many possible input arguments that it's hard to remember all of
them. The popup *stays on the screen as you type*,
highlighting each argument in turn:

This feature is easily overlooked, but it's very handy. Clicking
on __More
Help...__ on the right displays the help for
that function in a separate window.

**Some
examples** of my Matlab/Octave
user-defined functions related to signals and noise that
you can download and use are: stdev.m, a
standard deviation function that works in both Matlab and in
Octave; rsd.m, the relative standard
deviation; halfwidth.m for measuring
the full width at half maximum of smooth peaks; plotit.m, an easy-to-use
function for plotting and fitting x,y data in
matrices or in separate vectors; functions
for peak shapes commonly encountered in analytical
chemistry
such as Gaussian, Lorentzian, lognormal, Pearson 5, exponentially-broadened
Gaussian, exponentially-broadened
Lorentzian, exponential pulse, sigmoid, Gaussian/Lorentzian
blend, bifurcated
Gaussian, bifurcated Lorentzian), Voigt
profile, triangular
and peakfunction.m, a
function that generates any of those peak types
specified by number. ShapeDemo
demonstrates the 12 basic peak shapes graphically, showing the
variable-shape peaks as multiple lines. There are
functions for different types of random noise (white noise, pink noise, blue noise, proportional noise, and square root noise), a
function that applies exponential broadening (ExpBroaden.m),
a function that computes the interquartile range (IQrange.m),
a function that estimates the standard deviation of
a distribution with outliers by computing the
interquartile range and dividing it by 1.34896 (stdiqr.m); a function that
removes "not-a-number" entries from vectors (rmnan.m), and a function that
returns the index and the value of the element of
vector x that is closest to a particular value (val2ind.m). These functions
can be useful in modeling and simulating analytical
signals and testing measurement techniques.
You can click or ctrl-click on these links to inspect the code or you can right-click
and select "Save link as..."
to download them to
your computer.
Once you have downloaded those functions and placed them in the
"path", you can use them just like any other built-in function.
For example, you can plot a simulated Gaussian peak with white
noise by typing: `x=[1:256];
y=gaussian(x,128,64) + whitenoise(x); plot(x,y)`. The
script plotting.m uses the gaussian.m function to demonstrate the
distinction between the *height*, *position*, and *width
*of a Gaussian curve. The script SignalGenerator.m
calls several of these downloadable functions to create and plot
a realistic computer-generated signal with multiple peaks on a
variable baseline plus variable random noise; you might try to
modify the variables in the indicated places to make it look
like your type of data. All of these functions will work in the
latest version of Octave
without change. For a complete list of downloadable
functions and scripts developed for this project, see functions.html.

The Matlab/Octave script EnsembleAverageDemo.m
demonstrates ensemble averaging to improved the S/N ratio of a
very noisy signal. Click for
graphic. The script requires the "gaussian.m"
function to be downloaded and placed in the Matlab/Octave path,
or you can use another peak shape function, such as lorentzian.m or rectanglepulse.m.

The Matlab/Octave function noisetest.m demonstrates
the
appearance and effect of different noise types. It
plots Gaussian peaks with four different types of
added noise: constant white noise, constant pink (1/f) noise,
proportional white noise, and square-root white noise, then fits
a Gaussian to each noisy data set and computes the average and
the standard deviation of the peak height, position, width and
area for each noise type. Type "help noisetest" at the command
prompt. The Matlab/Octave script SubtractTwoMeasurements.m
demonstrates the technique of subtracting two separate
measurements of a waveform to extract the random noise (but
it works only if the signal is stable, except for the
noise). Graphic.

For signals that contain repetitive waveform patterns occurring in one continuous signal, with nominally the same shape except for noise, the interactive peak detector function iPeak has an ensemble averaging function (

See Appendix S: Measuring the Signal-to-Noise Ratio of Complex Signals for more examples of S/N ratio in Matlab/Octave.

French translation of earlier version of this page.

This page is part of "

Unique visits since June 17, 2009: