Experimental measurements are never perfect, even with sophisticated modern
instruments. Two main types or measurement errors are recognized: (a)
systematic error, in which every measurement is consistently less than or
greater than the correct value by a certain percentage or amount, and (b) random
error, in which there are unpredictable variations in the measured signal
from moment to moment or from measurement to measurement. This latter type of
error is often called noise,
by analogy to acoustic noise. There are
many sources of noise in physical measurements, such as building
vibrations,
air currents, electric power fluctuations, stray radiation from nearby
electrical equipment, static electricity, interference from radio and
TV transmissions, turbulence in the flow of gases or liquids, random
thermal motion of molecules, background radiation from natural
radioactive elements, "cosmic rays" from outer space (seriously), and
even the basic quantum nature of matter and
energy itself.
One of the fundamental problems in signal
measurement is distinguishing the noise from the signal. It is not
always easy. The signal
is the "important" part of the data that you
want to measure - it might be the average of the signal over a certain
time period, or it might be the height of a peak or the area under a
peak that occurs in the data. For example, in the absorption spectrum
in the right-hand half of
Figure 1 in the previous section, the "important" parts of the data
are
probably the absorption peaks located at 520 and 550 nm. The height of
either of those peaks might be considered the signal, depending on
the application. In this example, the height of the largest peak is about
0.08 absorbance units. The noise would be the standard deviation of that peak height from spectrum to spectrum (if
you had access to repeat measurements of the same spectrum). But
what if you had only one recording of that spectrum? In that case, you'd be forced to estimate
the noise in that single recording, based on the assumption that the visible short-term
fluctuations in the signal (the little random wiggles superimposed on
the smooth signal) are noise and not part of the signal. In this case, those fluctuations amount to
about 0.005 units peak-to-peak or a standard deviation of 0.001.
(For random fluctuations, the general rule of thumb is that the
standard
deviation is approximately 1/5 of the peak-to-peak variation between the highest and
the lowest readings). As
another example, the data on the right half of Figure 3, below, has a
peak in the center with a height of about 1.0. The peak-to-peak
noise on the baseline is also about 1.0, so the standard deviation of
the noise is about 1/5th of that, or 0.2.
The quality of a signal is often expressed quantitatively as
the signal-to-noise ratio
(SNR), which is the ratio of the true signal
amplitude (e.g. the average amplitude or the peak height) to
the
standard deviation of the noise. Thus the signal-to-noise ratio
of the spectrum in Figure 1 is about 0.08/0.001 = 80, and the signal in
Figure 3 has a signal-to-noise ratio of 1.0/0.2 = 5.
So we would say that the quality of the signal
in Figure 1 is better than that in Figure 3 because it has a greater SNR. Signal-to-noise ratio is
inversely proportional to the relative
standard
deviation of the signal amplitude. Measuring the signal-to-noise
ratio is much easier if the noise can be measured separately, in the
absence
of signal. Depending on the type of experiment, it may be
possible to acquire readings of the noise alone, for example on a
segment of the baseline before or after the occurrence of the signal.
However, if the magnitude of the noise depends on the level of the
signal, then the
experimenter must try to produce a constant signal level to allows
measurement of the noise on the signal. (In cases where it
is possible to model the shape of the signal exactly by means of a
mathematical function, the noise may be estimated by subtracting the
model signal from the experimental signal). If possible, it's aways
better to determine the standard deviation of repeated measurements of
the thing that you want to measure, rather than trying to estimate the
noise from a single recording of the data.
Sometimes the signal and the noise can be partly distinguished on the basis of frequency components:
for example, the signal may contain mostly low-frequency components and
the noise may be located a higher frequencies. This is the basis
of filtering and smoothing.
In both Figure 1 and Figure 3, the peaks contain mostly
low-frequency components, whereas the noise is distributed over a much
wider frequency range.
One key thing that really
distinguishes signal from noise is that random noise is not the same from one measurement of the signal to the next, whereas the
genuine signal is at least partially reproducible. So if the signal can be
measured more than once, use can be made of this fact by measuring the signal
over and over again, as fast as is practical, and adding up all the measurements
point-by-point, then dividing by the number of signal averaged. This is called ensemble averaging,
and it is one of the
most powerful methods for improving signals, when it can be applied.
For this to work properly, the noise must be random and the signal must
occur at the same time in each repeat. An
example is shown in Figure 3. Another example (EnsembleAverage1.wmv)
demonstrates the ensemble averaging of 1000 repeats of a signal, which improves the signal-to-noise by about 30 times.
Figure 3. Window 1 (left) is a single measurement of a very noisy
signal.
There is actually a broad peak near the center of this signal, but it
is not
possible to measure its position, width, and height accurately because
the
signal-to-noise ratio is very poor. Window 2 (right) is the
average of 9 repeated measurements of this signal, clearly showing the
peak
emerging from the noise. The expected improvement in signal-to-noise
ratio is
3 (the square root of 9). Often it is possible to average hundreds
of measurement, resulting is much more substantial improvement.
The
signal-to-noise ratio in the resulting average signal in this example
is about 5.
Noise is also distinguished by its frequency spectrum. White noise has equal power at all frequencies. It derives its name from white light,
which has equal brightness at all wavelengths in the visible region. The
noise in the example signal in Figure 3 is white. In the
acoustical domain, white noise sounds like a hiss. In measurement
science, white noise is not uncommon, but so is noise that has a more low-frequency-weighted
character, that is, that has more power at low frequencies that high
frequencies. This is often called "pink noise". In the acoustical domain, pink noise sounds more like a roar. A commonly-encountered sub-species of that type of noise is "1/f noise", where the noise power in inversely proportional to frequency. The application of smoothing and low-pass filtering
to reduce noise is more effective for white noise than for pink noise.
When pink noise is present, it is sometimes beneficial to apply
modulation techniques, such as optical chopping or wavelength modulation,
to convert a direct-current (DC) signal into an alternating current
(AC) signal, thereby increasing the frequency of the signal to a
frequency region where the noise is lower. In such cases it is common
to use a lock-in amplifier, or the digital equivalent thereof, to measure the amplitude of the signal.
Another property that distinguishes noise is its probability distribution,
the function that describes the probability of a random variable
falling within a certain range of values. In
physical measurements, the most common distribution is called normal and is described by a Gaussianfunction, in which the most common noise errors are small (that is, close to the mean)
and the errors become less common the greater their deviation from the
mean. Why is this so? The noise observed in physical measurements is
often the balanced sum of many unobserved random events, each of which
has some unknown
probablilty distribution related to, for example, the kinetic
properties of gases or liquids or to to the quantum mechanical
description of fundamental particles such as photons or electrons. But
when many such events combine to form the overall variability of an
observed quantity, the resulting probability distribution is almost
always normal, that is, described by a Gaussian function. This common observation is summed up in the Central Limit Theorem.
This
is easily demonstrated by a simple simulation. In the example on the
left, we start with a single random variable that is uniformly
distributed, that is, has an equal chance of having any value between
certain limits (between 0 and +1 in this case). The graph in the
upper left of the figure shows the probability distribution, called a
“histogram”,of
that random variable. We begin by subtracting two sets of
independent, uniformly-distributed random variables so that the
average is zero. The result (shown in the graph in the upper right in
the figure) has a triangular distribution between -1 and +1, with the
highest point at zero, because there are many ways for the difference
between two random numbers to be small, but only one way for the
difference to be 1 or to -1 (that happens only if one number is zero
and the other is 1). If we combine four independent random variables
(lower left), the resulting distribution has a total range of -2 to
+2, but it is even less likely that the result be near 2 or -2 and
many more ways for the result to be small, so the distribution is
narrower and more rounded, and is starting to be visually close to a
normal Gaussian distribution (shown for reference in the lower
right). If we combine more and more independent uniform random
variables, the combined probability distribution becomes closer and
closer to normal. (You
can download a Matlab script for this simulation from
http://terpconnect.umd.edu/~toh/spectrum/CentralLimitDemo.m).
The interesting thing is that the distributions of the individual events hardly matter at all.
You could modify the individual distributions in this simulation by
including additional functions, such as sqrt(rand), sin(rand), rand^2,
log(rand), etc, to obtain other non-normal individual distributions. No
matter what the distribution of the single random variable is, by
the time you combine even as few as four of them, the resulting
distribution is already visually close to normal. Real world
macroscopic observations are often composed of thousands or millions of
individual microscopic events, so the approach to a normal distribution is essentially perfect. It
is on this common adherence to normal distributions that the common
statistical procedures are based; the use of the mean, standard deviation, least-squares fits, confidence limits,
etc, are all based on the assumption of a normal distribution. Even so,
experimental errors and noise are not always normal; sometimes there
are very large errors that fall well beyond the “normal” range. They
are called “outliers” and they can have a very large effect on the
standard deviation. In such cases it's common to use the “interquartile range”
(IQR), defined as the difference between the upper and lower quartiles,
instead of the standard deviation, because the IQR is not effected by
outliers. For a normal distribution, the interquartile range is equal
to 1.34896 times the standard deviation. A quick way to check the
distribution of a large set of random numbesr is to compute both the
standard deviation and the interquartile range; if they are
approximately equal, the distribution is probably normal; if
the interquartile range is larger, there are probably outliers and
the standard deviation without the outliers can be estimated by
dividing the interquartile range by 1.34896.
In
spectroscopy, three fundamental types of noise are recognized, based on their origin and on how they vary with light intensity: photon
noise, detector noise, and flicker (fluctuation) noise. Photon noise
(often the limiting noise in instruments
that use photomultiplier detectors) is white and is proportional to the square root
of light intensity,
and therefore the SNR is proportional to the square root of light
intensity and directly proportional to the monochromator slit width. Detector
noise (often the limiting noise in instruments
that use solid-state photodiode detectors) is independent of the light
intensity and therefore the detector
SNR is directly proportional to the light intensity and to the square
of the monochromator slit width. Flicker noise, caused by light source
instability, vibration, sample cell positioning errors, sample
turbulence, light scattering by
suspended particles, dust, bubbles, etc., is usually pink rather than white and is directly proportional to
the light intensity, so the flicker signal-to-noise ratio is not decreased by increasing
the slit width. Flicker noise can usually be reduced or
eliminated by using specialized instrument designs such as double-beam, dual wavelength, derivative, and
wavelength modulation.
Video Demonstration of ensemble averaging. This 17-second video (EnsembleAverage1.wmv)
demonstrates the ensemble averaging of 1000 repeats of a signal with a
very poor signal-to-noise ratio. The signal itself consists of three
peaks located at x = 50, 100, and 150, with peak heights 1, 2, and 3
units. These signal peaks are buried in random noise whose standard
deviation is 10. Thus the signal-to-noise ratio of the smallest peaks
is 0.1, which is far too low to even see a signal, much less measure it. The video shows the accumulating average signal as 1000
measurements of the signal are performed. At the end, the noise is
reduced (on average) by the square root of 1000 (about 32), so that the
signal-to-noise ratio of the smallest peaks ends up being about 3, just
enough to detect the presence of a peak. Click here
to download the video (2 MBytes) in WMV format. (This demonstration was
created in Matlab 6.5. If you have access to that software, you may
download the original m-file, EnsembleAverage.zip).
SPECTRUM, the Macintosh freeware signal-processing
application that accompanies this tutorial, includes several functions
for measuring signals and noise in the Math and Window pull-down
menus, plus a signal-generator that can be used to generate artificial
signals with Gaussian and Lorentzian bands, sine waves, and
normally-distributed random noise in the New command in the File Menu.
Popular spreadsheets, such as Excelor Open Office Calc,
have built-in functions that can be used for measuring and plotting
signals and noise, such as AVERAGE, MAX, MIN, STDEV, and RAND.
Some spreadsheets have only a uniformly-distributed randon number
function (rand) and not a normally-distributed random number function,
but it's much more realistic to simulate errors that are normally
distributed. In that case it's convenient to make use of the
Central Limit Theorem
to create approximately normally distributed random numbers by
combining several RAND functions, for example,
1.73*(RAND()-RAND()+RAND()-RAND()) creates nearly normal random
numbers with a mean of zero, a standard deviation very close
to 1, and a maximum range of ±4. This trick is commonly used in spreadsheet models that simulate the operation of analytical instruments.Matlabhas built-in functions that can be used for measuring and plotting signals and noise, such as mean, max, min, std, plot, hist,
rand, and randn. Just type "help" and the function name at the Matlab
command >> prompt, e.g. "help mean". Most of these Matlab
functions apply to vectors and matrices as well as scalar variables.
So, for example if you have a set of signals in the rows of a
matrix S, where each column represents the value of each signal at
the same value of the independent variable (e.g. time), you can compute
the ensemble average of those signals just by typing "mean(S)", which
computes the mean of each column of S.
You can also create custom user-defined functions
to automate commonly-used algorithms. I have created some that you can
download and use: functions to calculate typical
peak shapes commonly encountered in analytical chemistry (gaussian, lorentzian) and typical types of random noise (whitenoise, pinknoise),
which can be useful in modeling and simulating analytical signals and
testing measurement techniques. (Click on these links to inspect the
code, or right-click to download for use within Matlab). Once you have
created or downloaded those functions, you can use them to plot a
simulated noisy peak such as in Figure 3 by typingx=[1:256];plot(x,gaussian(x,128,59)+whitenoise(x).
iSignalis
a downloadabe user-defined Matlab function that can plot signals with
pan and zoom controls and can measure signal and noise amplitudes in
selected regions of the signal. It's operated by simple keypresses.
Other capabilities of iSignal include smoothing, differentiation, peak
sharpening, and least-squares peak measurement. View the codehereor dowload theZIP filewith sample data for testing.
This page is maintained by Prof. Tom O'Haver , Department of Chemistry and
Biochemistry, The University of Maryland at College Park.
Comments, suggestions and questions should be directed to
Prof. O'Haver at toh@umd.edu.
Updated April 2012.
Unique visits since May 17, 2008: