Smoothing algorithms. The simplest smoothing algorithm is the rectangular or unweighted sliding-average smooth; it simply replaces each point in the signal with the average of m adjacent points, where m is a positive integer called the smooth width. For example, for a 3-point smooth (m = 3):
The triangular smooth is like the rectangular smooth, above, except that it implements a weighted smoothing function. For a 5-point smooth (m = 5):
End effects and the lost points problem. Note in the equations above that the 3-point rectangular smooth is defined only for j = 2 to n-1. There is not enough data in the signal to define a complete 3-point smooth for the first point in the signal (j = 1) or for the last point (j = n) , because there are no data points before the first point or after the last point. Similarly, a 5-point smooth is defined only for j = 3 to n-2, and therefore a smooth can not be calculated for the first two points or for the last two points. In general, for an m-width smooth, there will be (m-1)/2 points at the beginning of the signal and (m-1)/2 points at the end of the signal for which a complete m-width smooth can not be calculated. What to do? There are two approaches. One is to accept the loss of points and trim off those points or replace them with zeros in the smooth signal. (That's the approach taken in most of the figures in this paper). The other approach is to use progressively smaller smooths at the ends of the signal, for example to use 2, 3, 5, 7... point smooths for signal points 1, 2, 3,and 4..., and for points n, n-1, n-2, n-3..., respectively. The later approach may be preferable if the edges of the signal contain critical information, but it increases execution time. The fastsmooth function discussed below can utilize either of these two methods.
Examples of smoothing. A simple example of smoothing is shown in Figure 4. The left half of this signal is a noisy peak. The right half is the same peak after undergoing a triangular smoothing algorithm. The noise is greatly reduced while the peak itself is hardly changed. Smoothing increases the signal-to-noise ratio and allows the signal characteristics (peak position, height, width, area, etc.) to be measured more accurately, especially when computer-automated methods of locating and measuring peaks are being employed.
Figure 4. The left half of this signal is a noisy peak. The right half is the same peak after undergoing a smoothing algorithm. The noise is greatly reduced while the peak itself is hardly changed, making it easier to measure the peak position, height, and width.
The larger the smooth width, the greater the noise reduction, but also the greater the possibility that the signal will be distorted by the smoothing operation. The optimum choice of smooth width depends upon the width and shape of the signal and the digitization interval. For peak-type signals, the critical factor is the smoothing ratio, the ratio between the smooth width m and the number of points in the half-width of the peak. In general, increasing the smoothing ratio improves the signal-to-noise ratio but causes a reduction in amplitude and in increase in the bandwidth of the peak.
The figures above show examples of the effect of three different smooth
widths on noisy Gaussian-shaped peaks. In the figure on the left, the
peak has a (true) height of 2.0 and there are 80 points in the
half-width of the peak. The red line is the original unsmoothed peak.
The three superimposed green lines are the results of smoothing this
peak with a triangular smooth of width (from top to bottom) 7, 25, and
51 points. Because the peak width is 80 points, the smooth ratios
of these three smooths are 7/80 = 0.09, 25/80 = 0.31, and 51/80 = 0.64,
respectively. As the smooth width increases, the noise is progressively
reduced but the peak height also is reduced slightly. For the largest
smooth, the peak width is slightly increased. In the figure on the
right,
the original peak (in red) has a true height of 1.0 and a half-width of
33 points. (It is also less noisy than the example on the left.) The
three superimposed green lines are the results of the same three
triangular smooths of width (from top to bottom) 7, 25, and 51 points.
But because the peak width in this case is only 33 points, the smooth ratios
of these three smooths are larger - 0.21, 0.76, and 1.55, respectively.
You can see
that the peak distortion effect (reduction of peak height and increase
in peak width) is
greater for the narrower peak because the smooth ratios are higher.
Smooth ratios of greater than 1.0 are seldom used because of excessive
peak distortion. Note that even in the worst case, the peak positions
are not effected (assuming that the original peaks were symmetrical and not overlapped by other peaks).
It's important to point out that smoothing results such as illustrated in the figure above may be deceptively impressive because they employ a single sample of a noisy signal that is smoothed to different degrees. This causes the viewer to underestimate the contribution of low-frequency noise, which is hard to estimate visually because there are so few low-frequency cycles in the signal record. This error can be remedied by taking a large number of independent samples of noisy signal. This is illustrated in the Interactive Smoothing module for Matlab, which includes a "Resample" control that swaps the noise in the signal with different random noise samples, to demonstrate the low-frequency noise that remains in the signal after smoothing. This gives a much more realistic impression of the performance of smoothing.
Optimization of smoothing. Which is the best smooth ratio? It depends on the purpose of the peak measurement. If the objective of the measurement is to measure the true peak height and width, then smooth ratios below 0.2 should be used. (In the example on the left, the original peak (red line) has a peak height greater than the true value 2.0 because of the noise, whereas the smoothed peak with a smooth ratio of 0.09 has a peak height that is much closer to the correct value). But if the objective of the measurement is to measure the peak position (x-axis value of the peak), much larger smooth ratios can be employed if desired, because smoothing has no effect at all on the peak position (unless the increase in peak width is so much that it causes adjacent peaks to overlap).
In quantitative analysis applications, the peak height
reduction caused by smoothing is not so important, because in most
cases calibration is based on the signals of standard
solutions. If the same signal processing operations are applied to the
samples and to the
standards, the peak height reduction of the standard signals will be
exactly the same as that of the sample signals and the effect will
cancel out exactly. In such cases smooth widths from 0.5 to 1.0 can be
used if necessary to further
improve the signal-to-noise ratio. In practical analytical chemistry,
absolute
peak height measurements are seldom required; calibration against
standard solutions is the rule. (Remember: the objective
of a quantitative spectrophotometric procedure is not to measure
absorbance but rather to measure the concentration of the analyte.) It
is very important, however, to apply exactly the
same signal processing steps to the standard signals as to the sample signals, otherwise
a large systematic error may result.
When should you smooth a signal? There are two reasons to smooth a signal: (1) for cosmetic reasons, to prepare a nicer-looking graphic of a signal for visual inspection or publication, and (2) if the signal will be subsequently processed by an algorithm that would be adversely effected by the presence of too much high-frequency noise in the signal, for example if the location of maxima, mimima, or inflection points in the signal is to be automatically determined by detecting zero-crossings in derivatives of the signal. But one common situation where you should not smooth signals is prior to least-squares curve fitting, for two reasons: (a) because all smoothing algorithms are at least slightly "lossy", entailing at least some change in signal shape and amplitude; and (b) it is harder to evaulate the fit by inspecting the residuals if the data are smoothed, because smoothed noise may be mistaken for an actual signal (see Curve Fitting A). If these requirements conflict, care must be used in the design of algorithms. For example, in a popular technique for peak finding and measurement, peaks are located by detecting downward zero-crossings in the smoothed first derivative, but the position, height, and width of each peak is determined by least-squares curve-fitting of a segment of original unsmoothed data in the vicinity of the zero-crossing. Thus, even if heavy smoothing is necessary to provide reliable discrimination against noise peaks, the peak parameters extracted by curve fitting are not distorted.
Video Demonstration. This 18-second, 3 MByte video (Smooth3.wmv) demonstrates the effect of triangular smoothing on a single Gaussian peak with a peak height of 1.0 and peak width of 200. The initial white noise amplitude is 0.3, giving an initial signal-to-noise ratio of about 3.3. An attempt to measure the peak amplitude and peak width of the noisy signal, shown at the bottom of the video, are initially seriously inaccurate because of the noise. As the smooth width is increased, however, the signal-to-noise ratio improves and the accuracy of the measurements of peak amplitude and peak width are improved. However, above a smooth width of about 40 (smooth ratio 0.2), the smoothing causes the peak to be shorter than 1.0 and wider than 200, even though the signal-to-noise ratio continues to improve as the smooth width is increased. (This demonstration was created in Matlab 6.5.
The custom function fastsmooth implements all the types of smooths discussed above. (Click on this link to inspect the code, or right-click to download for use within Matlab). Fastsmooth is a Matlab function of the form s=fastsmooth(a,w, type, edge). The argument "a" is the input signal vector; "w" is the smooth width (a positive integer); "type" determines the smooth type: type=1 gives a rectangular (sliding-average or boxcar); type=2 gives a triangular (equivalent to 2 passes of a sliding average); type=3 gives a pseudo-Gaussian (equivalent to 3 passes of a sliding average). The argument "edge" controls how the "edges" of the signal (the first w/2 points and the last w/2 points) are handled. If edge=0, the edges are zero. (In this mode the elapsed time is independent of the smooth width. This gives the fastest execution time). If edge=1, the edges are smoothed with progressively smaller smooths the closer to the end. (In this mode the execution time increases with increasing smooth widths). The smoothed signal is returned as the vector "s". (You can leave off the last two input arguments: fastsmooth(Y,w,type) smooths with edge=0 and fastsmooth(Y,w) smooths with type=1 and edge=0). Compared to convolution-based smooth algorithms, fastsmooth typically gives much faster execution times, especially for large smooth widths; it can smooth a 1,000,000 point signal with a 1,000 point sliding average in less than 0.1 second.
Interactive Smoothing for Matlab
is a Matlab module for interactive smoothing for time-series signals,
with sliders that allow you to adjust the smoothing parameters
continuously while observing the effect on your signal dynamically. Run
SmoothSliderTest to see how it works. Can be used with any smoothing
function. Includes a self-contained interactive demo of the effect of
smoothing on peak height, width, and signal-to-noise ratio. If you have
access to that software, you may download the complete set of Matlab
Interactive Smoothing m-files (12 Kbytes), InteractiveSmoothing.zip so that you can experiment with all the variables at will and try out this technique on your own signals).
Note: you can right-click on any of the m-file links above and select Save Link As... to download them to your computer for use within Matlab.