Smoothing algorithms. The simplest smoothing algorithm is the rectangular or unweighted sliding-average smooth; it simply replaces each point in the signal with the average of m adjacent points, where m is a positive integer called the smooth width. For example, for a 3-point smooth (m = 3):
The triangular smooth is like the rectangular smooth, above, except that it implements a weighted smoothing function. For a 5-point smooth (m = 5):
End effects and the lost points problem. Note in the equations above that the 3-point rectangular smooth is defined only for j = 2 to n-1. There is not enough data in the signal to define a complete 3-point smooth for the first point in the signal (j = 1) or for the last point (j = n) , because there are no data points before the first point or after the last point. (Similarly, a 5-point smooth is defined only for j = 3 to n-2, and therefore a smooth can not be calculated for the first two points or for the last two points). In general, for an m-width smooth, there will be (m-1)/2 points at the beginning of the signal and (m-1)/2 points at the end of the signal for which a complete m-width smooth can not be calculated. What to do? There are two approaches. One is to accept the loss of points and trim off those points or replace them with zeros in the smooth signal. (That's the approach taken in most of the figures in this paper). The other approach is to use progressively smaller smooths at the ends of the signal, for example to use 2, 3, 5, 7... point smooths for signal points 1, 2, 3,and 4..., and for points n, n-1, n-2, n-3..., respectively. The later approach may be preferable if the edges of the signal contain critical information, but it increases execution time. The fastsmooth function discussed below can utilize either of these two methods.
Examples of smoothing. A simple example of smoothing is shown in Figure 4. The left half of this signal is a noisy peak. The right half is the same peak after undergoing a triangular smoothing algorithm. The noise is greatly reduced while the peak itself is hardly changed. Smoothing increases the signal-to-noise ratio and allows the signal characteristics (peak position, height, width, area, etc.) to be measured more accurately, especially when computer-automated methods of locating and measuring peaks are being employed.
Figure 4. The left half of this signal is a noisy peak. The right half is the same peak after undergoing a smoothing algorithm. The noise is greatly reduced while the peak itself is hardly changed, making it easier to measure the peak position, height, and width directly by graphical or visual estimation, but it does not improve measurements made by least-squares methods (see below).
The larger the smooth width, the greater the noise reduction, but also the greater the possibility that the signal will be distorted by the smoothing operation. The optimum choice of smooth width depends upon the width and shape of the signal and the digitization interval. For peak-type signals, the critical factor is the smoothing ratio, the ratio between the smooth width m and the number of points in the half-width of the peak. In general, increasing the smoothing ratio improves the signal-to-noise ratio but causes a reduction in amplitude and in increase in the bandwidth of the peak.
The figures above show examples of the effect of three different
smooth widths on noisy Gaussian-shaped peaks. In the figure on the
left, the peak has a (true) height of 2.0 and there are 80 points
in the half-width of the peak. The red line is the original
unsmoothed peak. The three superimposed green lines are the
results of smoothing this peak with a triangular smooth of width
(from top to bottom) 7, 25, and 51 points. Because the peak width
is 80 points, the smooth ratios of these three smooths are
7/80 = 0.09, 25/80 = 0.31, and 51/80 = 0.64, respectively. As the
smooth width increases, the noise is progressively reduced but the
peak height also is reduced slightly. For the largest smooth, the
peak width is slightly increased. In the figure on the right, the
original peak (in red) has a true height of 1.0 and a half-width
of 33 points. (It is also less noisy than the example on the
left.) The three superimposed green lines are the results of the
same three triangular smooths of width (from top to bottom) 7, 25,
and 51 points. But because the peak width in this case is only 33
points, the smooth ratios of these three smooths are
larger - 0.21, 0.76, and 1.55, respectively. You can see that the
peak distortion effect (reduction of peak height and increase in
peak width) is greater for the narrower peak because the smooth
ratios are higher. Smooth ratios of greater than 1.0 are seldom
used because of excessive peak distortion. Note that even in the
worst case, the peak positions are not effected (assuming that the
original peaks were symmetrical and not overlapped by other
peaks). If retaining the shape of the peak is more important than
optimizing the signal-to-noise ratio, the Savitzky-Golay has the
advantage over sliding-average smooths.
It's important to point out that smoothing results such as
illustrated in the figures above may be deceptively optimistic because they employ a single sample of a noisy
signal that is smoothed to different degrees. Smoothing is
essentially a type of low-pass filtering that reduces the
high-frequency components of a signal while retaining the
low-frequency components. This causes the viewer to overestimate
the quality of a smoothed noisy signal, because one tends to underestimate the contribution of
low-frequency noise, which is hard to estimate visually
because there are so few low-frequency cycles in the signal
record. This error can be remedied by taking a large number
of independent samples of noisy signal. The same sort of error
occurs when least-squares methods
methods are used to measure the parameters such as the slope,
intercept, height, position, and width of noisy signals.
The figure on the right is another example
signal that illustrates some of these principles. You can
download the data file "udx" in TXT format
or in Matlab MAT format. The signal
consists of two Gaussian peaks, one located at x=50 and the second
at x=150. Both peaks have a peak height of 1.0 and a peak
half-width of 10, and a normally-distributed random white noise
with a standard deviation of 0.1 has been added to the entire
signal. The x-axis sampling interval, however, is different for
the two peaks; it's 0.1 for the first peaks and 1.0 for the second
peak. This means that the first peak is characterized by ten
times more points that the second peak. It may look like the first peak is
noisier than the second, but that's just an illusion; the
signal-to-noise ratio for both peaks is 10. The second peak looks
less noisy only because there are fewer noise samples there and we
tend to underestimate the dispersion of small samples. The result
of this is that when the signal is smoothed, the second peak is
much more likely to be distorted by the smooth (it becomes shorter
and wider) than the first peak. The first peak can tolerate a much
wider smooth width, resulting in a greater degree of noise
reduction. (Similarly, if both peaks are measured with the peakfit method, the results
on the first peak will be about 3 times more accurate than the
second peak, because there are 10 times more data points in that
peak, and the measurement precision improves roughly with the
square root of the number of data points if the noise is
white).
Optimization of smoothing. Which is the best smooth ratio? It depends on the purpose of the peak measurement. If the objective of the measurement is to measure the true peak height and width, then smooth ratios below 0.2 should be used. (In the example on the left above, the original peak (red line) has a peak height greater than the true value 2.0 because of the noise, whereas the smoothed peak with a smooth ratio of 0.09 has a peak height that is much closer to the correct value). Measuring the height of noisy peaks is much better done by curve fitting the unsmoothed data rather than by taking the maximum of the smoothed data (see CurveFittingC.html#Smoothing). But if the objective of the measurement is to measure the peak position (x-axis value of the peak), much larger smooth ratios can be employed if desired, because smoothing has no effect at all on the peak position (unless the increase in peak width is so much that it causes adjacent peaks to overlap).
In quantitative analysis applications, the peak height reduction caused by smoothing is not so important, because in most cases calibration is based on the signals of standard solutions. If the same signal processing operations are applied to the samples and to the standards, the peak height reduction of the standard signals will be exactly the same as that of the sample signals and the effect will cancel out exactly. In such cases smooth widths from 0.5 to 1.0 can be used if necessary to further improve the signal-to-noise ratio. In practical analytical chemistry, absolute peak height measurements are seldom required; calibration against standard solutions is the rule. (Remember: the objective of quantitative analysis is not to measure a signal but rather to measure the concentration of the analyte.) It is very important, however, to apply exactly the same signal processing steps to the standard signals as to the sample signals, otherwise a large systematic error may result.
For a comparison of all four smoothing types considered above,
see SmoothingComparison.html.
When should you smooth a signal? There are two reasons to smooth a signal: (1) for cosmetic reasons, to prepare a nicer-looking graphic of a signal for visual inspection or publication, and (2) if the signal will be subsequently processed by an algorithm that would be adversely effected by the presence of too much high-frequency noise in the signal, for example if the heights of peaks are to be determined graphically or by using the MAX function, or if the location of maxima, minima, or inflection points in the signal is to be automatically determined by detecting zero-crossings in derivatives of the signal. Optimization of the amount and type of smoothing is very important in these cases (see Differentiation.html#Smoothing).
Care must be used in the design of algorithms that employ smoothing. For example, in a popular technique for peak finding and measurement, peaks are located by detecting downward zero-crossings in the smoothed first derivative, but the position, height, and width of each peak is determined by least-squares curve-fitting of a segment of original unsmoothed data in the vicinity of the zero-crossing. Thus, even if heavy smoothing is necessary to provide reliable discrimination against noise peaks, the peak parameters extracted by curve fitting are not distorted by the smoothing.
When
should you NOT smooth a signal? One common situation
where you should not
smooth signals is prior to statistical procedures such as least-squares curve fitting,
because: (a) smoothing will not significantly improve the accuracy
of parameter measurement by least-squares measurements between
separate independent signal samples; (b) all smoothing algorithms
are at least slightly "lossy", entailing at least some change
in signal shape and amplitude, (c) it is harder to evaluate the
fit by inspecting the residuals if the data are smoothed, because
smoothed noise may be mistaken
for an actual signal, and (d) smoothing the signal will
seriously underestimate the parameters errors predicted by propagation-of-error
calculations and the bootstrap
method. Smoothing can be used to locate peaks
but it should not be used to measure peaks.
Dealing with spikes. Sometimes signals are contaminated with very tall, narrow “spikes” occurring at random intervals and with random amplitudes, but with widths of only one or a few points. It not only looks ugly, but it also upsets the assumptions of least-squares computations because it is not normally-distributed random noise. This type of interference is difficult to eliminate using the above smoothing methods without distorting the signal. However, a “median” filter, which replaces each point in the signal with the median (rather than the average) of m adjacent points, can completely eliminate narrow spikes with little change in the signal, if the width of the spikes is only one or a few points and equal to or less than m. It can be applied prior to least-squares functions. See http://en.wikipedia.org/wiki/Median_filter.
Condensing oversampled signals. Sometimes signals are recorded more densely (that is, with smaller x-axis intervals) than really necessary to capture all the features of the signal. This results in larger-than-necessary data sizes, which slows down signal processing procedures and may tax storage capacity. To correct this, oversampled signals can be reduced in size either by eliminating data points (say, dropping every other point or every third point) or by replacing groups of adjacent points by their averages. The later approach has the advantage of using rather than discarding extraneous data points, and it acts like smoothing to provide some measure of noise reduction. (If the noise in the original signal is white, and the signal is condensed by averaging every n points, the noise is reduced in the condensed signal by the square root of n, with no change in frequency distribution of the noise).
Video Demonstration. This 18-second, 3 MByte video (Smooth3.wmv) demonstrates the effect of triangular smoothing on a single Gaussian peak with a peak height of 1.0 and peak width of 200. The initial white noise amplitude is 0.3, giving an initial signal-to-noise ratio of about 3.3. An attempt to measure the peak amplitude and peak width of the noisy signal, shown at the bottom of the video, are initially seriously inaccurate because of the noise. As the smooth width is increased, however, the signal-to-noise ratio improves and the accuracy of the measurements of peak amplitude and peak width are improved. However, above a smooth width of about 40 (smooth ratio 0.2), the smoothing causes the peak to be shorter than 1.0 and wider than 200, even though the signal-to-noise ratio continues to improve as the smooth width is increased. (This demonstration was created in Matlab 6.5.
Diederick has published a Savitzky-Golay smooth function in Matlab, which you can download from the Matlab File Exchange.
Here's a simple experiment in Matlab or Octave that creates a Gaussian peak, smooths it, compares the smoothed and unsmoothed version, then uses the peakfit.m function (version 3.4 or later) to show that smoothing reduces the peak height (from 1 to 0.786) and increases the peak width (from 1.66 to 2.12), but has little effect on the total peak area (a mere 0.2% change). In fact, there is no need to smooth the data if the peak height, position, and/or width will be measured by least-squares methods, because the results obtained on the unsmoothed data will be more accurate (see CurveFittingC.html#Smoothing).
>> x=[0:.1:10]';The Matlab/Octave user-defined function medianfilter.m, medianfilter(y,w), performs a median-based filter operation that replaces each value of y with the median of w adjacent points (which must be a positive integer).
ProcessSignal, a Matlab/Octave command-line function that performs smoothing and differentiation on the time-series data set x,y (column or row vectors). It can employ all the types of smoothing described above. Type "help ProcessSignal". Returns the processed signal as a vector that has the same shape as x, regardless of the shape of y. The syntax is Processed=ProcessSignal(x,y,DerivativeMode,w,type,ends,Sharpen,factor1,factor2,SlewRate,MedianWidth)
iSignal is an interactive function for Matlab that performs smoothing for time-series signals using all the algorithms discussed above, including the Savitzky-Golay smooth, with keystrokes that allow you to adjust the smoothing parameters continuously while observing the effect on your signal instantly. Version 2.2 also includes a median filter and a condense function. Other functions include differentiation, peak sharpening, and least-squares peak measurement. View the code here or download the ZIP file with sample data for testing.
iSignal for Matlab. Click to view larger figures.
Note: you can right-click on any of the m-file links on this site and select Save Link As... to download them to your computer for use within Matlab. Unfortunately, iSignal does not currently work in Octave.