index previous next

Signals and noise

Experimental measurements are never perfect, even with sophisticated modern instruments. Two main types or measurement errors are recognized: (a) systematic error, in which every measurement is consistently less than or greater than the correct value by a certain percentage or amount, and (b) random error, in which there are unpredictable variations in the measured signal from moment to moment or from measurement to measurement. This latter type of error is often called noise, by analogy to acoustic noise. There are many sources of noise in physical measurements, such as building vibrations, air currents, electric power fluctuations, stray radiation from nearby electrical equipment, static electricity, interference from radio and TV transmissions, turbulence in the flow of gases or liquids, random thermal motion of molecules, background radiation from natural radioactive elements, "cosmic rays" from outer space (seriously), and even the basic quantum nature of matter and energy itself.

One of the fundamental problems in signal measurement is distinguishing the noise from the signal. It is not always easy. The signal is the "important" part of the data that you want to measure - it might be the average of the signal over a certain time period, or it might be the height of a peak or the area under a peak that occurs in the data. For example, in the absorption spectrum in the right-hand half of Figure 1 in the previous section, the "important" parts of the data are probably the absorption peaks located at 520 and 550 nm. The height of either of those peaks might be considered the signal, depending on the application. In this example, the height of the largest peak is about 0.08 absorbance units. The noise would be the standard deviation of that peak height from spectrum to spectrum (if you had access to repeat measurements of the same spectrum). But what if you had only one recording of that spectrum?  In that case, you'd be forced to estimate the noise in that single recording, based on the assumption that the visible short-term fluctuations in the signal (the little random wiggles superimposed on the smooth signal) are noise and not part of the signal.  In this case, those fluctuations amount to about 0.005 units peak-to-peak or a standard deviation of 0.001. (For random fluctuations, the general rule of thumb is that the standard deviation is approximately 1/5 of the peak-to-peak variation between the highest and the lowest readings).  As another example, the data on the right half of Figure 3, below, has a peak in the center with a height of about 1.0.  The peak-to-peak noise on the baseline is also about 1.0, so the standard deviation of the noise is about 1/5th of that, or 0.2. 

The quality of a signal is often expressed quantitatively as the signal-to-noise ratio (SNR), which is the ratio of the true signal amplitude (e.g. the average amplitude or the peak height) to the standard deviation of the noise.  Thus the signal-to-noise ratio of the spectrum in Figure 1 is about 0.08/0.001 = 80, and the signal in Figure 3 has a  signal-to-noise ratio of 1.0/0.2 = 5.  So we would say that the quality of the signal in Figure 1 is better than that in Figure 3 because it has a greater SNR. Signal-to-noise ratio is inversely proportional to the relative standard deviation of the signal amplitude. Measuring the signal-to-noise ratio is much easier if the noise can be measured separately, in the absence of signal. Depending on the type of experiment, it may be possible to acquire readings of the noise alone, for example on a segment of the baseline before or after the occurrence of the signal. However, if the magnitude of the noise depends on the level of the signal, then the experimenter must try to produce a constant signal level to allows measurement of the noise on the signal. (In cases where it is possible to model the shape of the signal exactly by means of a mathematical function, the noise may be estimated by subtracting the model signal from the experimental signal). If possible, it's aways better to determine the standard deviation of repeated measurements of the thing that you want to measure, rather than trying to estimate the noise from a single recording of the data.

Sometimes the signal and the noise can be partly distinguished on the basis of frequency components: for example, the signal may contain mostly low-frequency components and the noise may be located a higher frequencies. This is the basis of filtering and smoothing.  In both Figure 1 and Figure 3, the peaks contain mostly low-frequency components, whereas the noise is distributed over a much wider frequency range.

One key thing that really distinguishes signal from noise is that random noise is not the same from one measurement of the signal to the next, whereas the genuine signal is at least partially reproducible. So if the signal can be measured more than once, use can be made of this fact by measuring the signal over and over again, as fast as is practical, and adding up all the measurements point-by-point, then dividing by the number of signal averaged. This is called ensemble averaging, and it is one of the most powerful methods for improving signals, when it can be applied. For this to work properly, the noise must be random and the signal must occur at the same time in each repeat. An example is shown in Figure 3.  Another example (EnsembleAverage1.wmv) demonstrates the ensemble averaging of 1000 repeats of a signal, which improves the signal-to-noise by about 30 times.

Figure 3. Window 1 (left) is a single measurement of a very noisy signal. There is actually a broad peak near the center of this signal, but it is not possible to measure its position, width, and height accurately because the signal-to-noise ratio is very poor. Window 2 (right) is the average of 9 repeated measurements of this signal, clearly showing the peak emerging from the noise. The expected improvement in signal-to-noise ratio is 3 (the square root of 9). Often it is possible to average hundreds of measurement, resulting is much more substantial improvement. The signal-to-noise ratio in the resulting average signal in this example is about 5.  

Noise is also distinguished by its frequency spectrum.  White noise has equal power at all frequencies. It derives its name from white light, which has equal brightness at all wavelengths in the visible region. The noise in the example signal in Figure 3 is white.  In the acoustical domain, white noise sounds like a hiss. In measurement science, white noise is not uncommon, but so is noise that has a more low-frequency-weighted character, that is, that has more power at low frequencies that high frequencies. This is often called "pink noise". In the acoustical domain, pink noise sounds more like a roar. A commonly-encountered sub-species of that type of noise is "1/f noise", where the noise power in inversely proportional to frequency.  The application of smoothing and low-pass filtering to reduce noise is more effective for white noise than for pink noise.  When pink noise is present, it is sometimes beneficial to apply modulation techniques, such as optical chopping or wavelength modulation, to convert a direct-current (DC) signal into an alternating current (AC) signal, thereby increasing the frequency of the signal to a frequency region where the noise is lower. In such cases it is common to use a lock-in amplifier, or the digital equivalent thereof, to measure the amplitude of the signal.  

Another property that distinguishes noise is its probability distribution, the function that describes the probability of a random variable falling within a certain range of values.  In physical measurements, the most common distribution is called normal and is described by a Gaussian function, in which the most common noise errors are small (that is, close to the mean) and the errors become less common the greater their deviation from the mean. Why is this so? The noise observed in physical measurements is often the balanced sum of many unobserved random events, each of which has some unknown probablilty distribution related to, for example, the kinetic properties of gases or liquids or to to the quantum mechanical description of fundamental particles such as photons or electrons. But when many such events combine to form the overall variability of an observed quantity, the resulting probability distribution is almost always normal, that is, described by a Gaussian function. This common observation is summed up in the Central Limit Theorem.

This is easily demonstrated by a simple simulation. In the example on the left, we start with a single random variable that is uniformly distributed, that is, has an equal chance of having any value between certain limits (between 0 and +1 in this case). The graph in the upper left of the figure shows the probability distribution, called a “histogram”, of that random variable. We begin by subtracting two sets of independent, uniformly-distributed random variables so that the average is zero. The result (shown in the graph in the upper right in the figure) has a triangular distribution between -1 and +1, with the highest point at zero, because there are many ways for the difference between two random numbers to be small, but only one way for the difference to be 1 or to -1 (that happens only if one number is zero and the other is 1). If we combine four independent random variables (lower left), the resulting distribution has a total range of -2 to +2, but it is even less likely that the result be near 2 or -2 and many more ways for the result to be small, so the distribution is narrower and more rounded, and is starting to be visually close to a normal Gaussian distribution (shown for reference in the lower right). If we combine more and more independent uniform random variables, the combined probability distribution becomes closer and closer to normal. (You can download a Matlab script for this simulation from http://terpconnect.umd.edu/~toh/spectrum/CentralLimitDemo.m). 


The interesting thing is that the distributions of the individual events hardly matter at all. You could modify the individual distributions in this simulation by including additional functions, such as sqrt(rand), sin(rand), rand^2, log(rand), etc, to obtain other non-normal individual distributions. No matter what the distribution of the single random variable is, by the time you combine even as few as four of them, the resulting distribution is already visually close to normal. Real world macroscopic observations are often composed of thousands or millions of individual microscopic events, so the approach to a normal distribution is essentially perfect. It is on this common adherence to normal distributions that the common statistical procedures are based; the use of the mean, standard deviation, least-squares fits, confidence limits, etc, are all based on the assumption of a normal distribution. Even so, experimental errors and noise are not always normal; sometimes there are very large errors that fall well beyond the “normal” range. They are called “outliers” and they can have a very large effect on the standard deviation. In such cases it's common to use the “interquartile range” (IQR), defined as the difference between the upper and lower quartiles, instead of the standard deviation, because the IQR is not effected by outliers. For a normal distribution, the interquartile range is equal to 1.34896 times the standard deviation. A quick way to check the distribution of a large set of random numbesr is to compute both the standard deviation and the interquartile range; if they are approximately equal, the distribution is probably normal; if the interquartile range is larger, there are probably outliers and the standard deviation without the outliers can be estimated by dividing the interquartile range by 1.34896.  

In spectroscopy, three fundamental types of noise are recognized, based on their origin and on how they vary with light intensity: photon noise, detector noise, and flicker (fluctuation) noise. Photon noise (often the limiting noise in instruments that use photomultiplier detectors) is white and is proportional to the square root of light intensity, and therefore the SNR is proportional to the square root of light intensity and directly proportional to the monochromator slit width. Detector noise (often the limiting noise in instruments that use solid-state photodiode detectors) is independent of the light intensity and therefore the detector SNR is directly proportional to the light intensity and to the square of the monochromator slit width. Flicker noise, caused by light source instability, vibration, sample cell positioning errors, sample turbulence, light scattering by suspended particles, dust, bubbles, etc., is usually pink rather than white and is directly proportional to the light intensity, so the flicker signal-to-noise ratio is not decreased by increasing the slit width. Flicker noise can usually be reduced or eliminated by using specialized instrument designs such as double-beam, dual wavelength, derivative, and wavelength modulation.

Video Demonstration of ensemble averaging. This 17-second video (EnsembleAverage1.wmv) demonstrates the ensemble averaging of 1000 repeats of a signal with a very poor signal-to-noise ratio. The signal itself consists of three peaks located at x = 50, 100, and 150, with peak heights 1, 2, and 3 units. These signal peaks are buried in random noise whose standard deviation is 10. Thus the signal-to-noise ratio of the smallest peaks is 0.1, which is far too low to even see a signal, much less measure it. The video shows the accumulating average signal as 1000 measurements of the signal are performed. At the end, the noise is reduced (on average) by the square root of 1000 (about 32), so that the signal-to-noise ratio of the smallest peaks ends up being about 3, just enough to detect the presence of a peak. Click here to download the video (2 MBytes) in WMV format. (This demonstration was created in Matlab 6.5. If you have access to that software, you may download the original m-file, EnsembleAverage.zip).


SPECTRUM, the Macintosh freeware signal-processing application that accompanies this tutorial, includes several functions for measuring signals and noise in the Math and Window pull-down menus, plus a signal-generator that can be used to generate artificial signals with Gaussian and Lorentzian bands, sine waves, and normally-distributed random noise in the New command in the File Menu.
Popular spreadsheets, such as Excel or Open Office Calc, have built-in functions that can be used for measuring and plotting signals and noise, such as AVERAGE, MAX, MIN, STDEV, and RAND.  Some spreadsheets have only a uniformly-distributed randon number function (rand) and not a normally-distributed random number function, but it's much more realistic to simulate errors that are normally distributed. In that case it's convenient to make use of the Central Limit Theorem to create approximately normally distributed random numbers by combining several RAND functions, for example, 1.73*(RAND()-RAND()+RAND()-RAND()) creates nearly normal random numbers with a mean of zero, a standard deviation very close to 1, and a maximum range of ±4. This trick is commonly used in spreadsheet models that simulate the operation of analytical instruments.
Matlab has built-in functions that can be used for measuring and plotting signals and noise, such as mean, max, min, std, plot, hist, rand, and randn. Just type "help" and the function name at the Matlab command >> prompt, e.g. "help mean".  Most of these Matlab functions apply to vectors and matrices as well as scalar variables. So, for example if you have a set of signals in the rows of a matrix S, where each column represents the value of each signal at the same value of the independent variable (e.g. time), you can compute the ensemble average of those signals just by typing "mean(S)", which computes the mean of each column of S.  

You can also create custom user-defined functions to automate commonly-used algorithms. I have created some that you can download and use: functions to calculate typical peak shapes commonly encountered in analytical chemistry (gaussian, lorentzian) and typical types of random noise (whitenoise, pinknoise), which can be useful in modeling and simulating analytical signals and testing measurement techniques. (Click on these links to inspect the code, or right-click to download for use within Matlab). Once you have created or downloaded those functions, you can use them to plot a simulated noisy peak such as in Figure 3 by typing x=[1:256];plot(x,gaussian(x,128,59)+whitenoise(x).

iSignal is a downloadabe user-defined Matlab function that can plot signals with pan and zoom controls and can measure signal and noise amplitudes in selected regions of the signal. It's operated by simple keypresses. Other capabilities of iSignal include smoothing, differentiation, peak sharpening, and least-squares peak measurement. View the code here or dowload the ZIP file with sample data for testing.
index previous next
This page is maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu. Updated April 2012.
Unique visits since May 17, 2008: