Smoothing algorithms. Most smoothing algorithms are based on the "shift and multiply" technique, in which a group of adjacent points in the original data are multiplied point-by-point by a set of numbers (coefficients) that defines the smooth shape, the products are added up and divided by the sum of the coefficients, which becomes one point of smoothed data, then the set of coefficients is shifted one point down the original data and the process is repeated. The simplest smoothing algorithm is the rectangular boxcar or unweighted sliding-average smooth; it simply replaces each point in the signal with the average of m adjacent points, where m is a positive integer called the smooth width. For example, for a 3-point smooth (m = 3):
The triangular smooth is like the rectangular smooth, above, except that it implements a weighted smoothing function. For a 5-point smooth (m = 5):
|Original unsmoothed noise||1|
|Smoothed white noise||0.1|
|Smoothed pink noise||0.55|
|Smoothed blue noise||0.01|
Examples of smoothing.A
simple example of smoothing is shown in Figure 4. The left
half of this signal is a noisy peak. The right half is the
same peak after undergoing a triangular smoothing algorithm.
The noise is greatly reduced while the peak itself is hardly
changed. Smoothing increases the signal-to-noise ratio and
allows the signal characteristics (peak position, height,
width, area, etc.) to be measured more accurately by visual
Figure 4. The left half of this signal is a noisy peak. The right half is the same peak after undergoing a smoothing algorithm. The noise is greatly reduced while the peak itself is hardly changed, making it easier to measure the peak position, height, and width directly by graphical or visual estimation (but it does not improve measurements made by least-squares methods; see below).
The larger the smooth width, the greater the noise reduction, but also the greater the possibility that the signal will be distorted by the smoothing operation. The optimum choice of smooth width depends upon the width and shape of the signal and the digitization interval. For peak-type signals, the critical factor is the smoothing ratio, the ratio between the smooth width m and the number of points in the half-width of the peak. In general, increasing the smoothing ratio improves the signal-to-noise ratio but causes a reduction in amplitude and in increase in the bandwidth of the peak.
The figures above show examples of the effect of
three different smooth widths on noisy Gaussian-shaped
peaks. In the figure on the left, the peak has a (true)
height of 2.0 and there are 80 points in the half-width of
the peak. The red line is the original unsmoothed peak. The
three superimposed green lines are the results of smoothing
this peak with a triangular smooth of width (from top to
bottom) 7, 25, and 51 points. Because the peak width is 80
points, the smooth ratios of these three smooths are
7/80 = 0.09, 25/80 = 0.31, and 51/80 = 0.64, respectively.
As the smooth width increases, the noise is progressively
reduced but the peak height also is reduced slightly. For
the largest smooth, the peak width is slightly increased. In
the figure on the right, the original peak (in red) has a
true height of 1.0 and a half-width of 33 points. (It is
also less noisy than the example on the left.) The three
superimposed green lines are the results of the same three
triangular smooths of width (from top to bottom) 7, 25, and
51 points. But because the peak width in this case is only
33 points, the smooth ratios of these three smooths
are larger - 0.21, 0.76, and 1.55, respectively. You can see
that the peak distortion effect (reduction of peak height
and increase in peak width) is greater for the narrower peak
because the smooth ratios are higher. Smooth ratios of
greater than 1.0 are seldom used because of excessive peak
distortion. Note that even in the worst case, the peak
positions are not effected (assuming that the original peaks
were symmetrical and not overlapped by other peaks). If
retaining the shape of the peak is more important than
optimizing the signal-to-noise ratio, the Savitzky-Golay has
the advantage over sliding-average smooths. In all cases,
the total area under the peak
It should be clear that smoothing can seldom
completely eliminate noise, because most noise is spread out
over a wide range of frequencies, and smoothing simply
reduces the noise in part of its frequency range.
Only for some very specific types of noise (e.g. discrete
frequency noise or single-point spikes) is there hope of
anything close to complete noise elimination.
The figure on the right below is another example signal that illustrates some of these principles. The signal consists of two Gaussian peaks, one located at x=50 and the second at x=150. Both peaks have a peak height of 1.0 and a peak half-width of 10, and a normally-distributed random white noise with a standard deviation of 0.1 has been added to the entire signal. The x-axis sampling interval, however, is different for the two peaks; it's 0.1 for the first peak (from x=0 to 100) and 1.0 for the second peak (from x=100 to 200). This means that the first peak is characterized by ten times more points that the second peak. It may look like the first peak is noisier than the second, but that's just an illusion; the signal-to-noise ratio for both peaks is 10. The second peak looks less noisy only because there are fewer noise samples there and we tend to underestimate the dispersion of small samples. The result of this is that when the signal is smoothed, the second peak is much more likely to be distorted by the smooth (it becomes shorter and wider) than the first peak. The first peak can tolerate a much wider smooth width, resulting in a greater degree of noise reduction. (Similarly, if both peaks are measured with the least-squares curve fitting method, the fit of the first peak is more stable with the noise and the measured parameters of that peak will be about 3 times more accurate than the second peak, because there are 10 times more data points in that peak, and the measurement precision improves roughly with the square root of the number of data points if the noise is white). You can download the data file "udx" in TXT format or in Matlab MAT format.
Optimization of smoothing.
smoothing ratio increases, noise is reduced quickly at
first, then more slowly, and the peak height is also
reduced, slowly at first,
then more quickly. The result is that the signal-to-noise
increases quickly at first, then reaches a maximum. This is
illustrated in the figure on the left for a Gaussian peak
with white noise (produced by the Matlab/Octave script SmoothWidthTest.m).
Which is the best smooth ratio? It depends on the purpose of the peak measurement. If the objective of the measurement is to measure the true peak height and width, then smooth ratios below 0.2 should be used and the Savitzky-Golay smooth is preferred. Measuring the height of noisy peaks is much better done by curve fitting the unsmoothed data rather than by taking the maximum of the smoothed data (see CurveFittingC.html#Smoothing). But if the objective of the measurement is to measure the peak position (x-axis value of the peak), much larger smooth ratios can be employed if desired, because smoothing has little effect on the peak position (unless peak is asymmetrical or the increase in peak width is so much that it causes adjacent peaks to overlap).
In quantitative analysis applications based on calibration by standard samples, the peak height reduction caused by smoothing is not so important. If the same signal processing operations are applied to the samples and to the standards, the peak height reduction of the standard signals will be exactly the same as that of the sample signals and the effect will cancel out exactly. In such cases smooth widths from 0.5 to 1.0 can be used if necessary to further improve the signal-to-noise ratio, as shown in the figure on the left (for a simple sliding-average rectangular smooth). In practical analytical chemistry, absolute peak height measurements are seldom required; calibration against standard solutions is the rule. (Remember: the objective of quantitative analysis is not to measure a signal but rather to measure the concentration of the analyte.) It is very important, however, to apply exactly the same signal processing steps to the standard signals as to the sample signals, otherwise a large systematic error may result.
For a more detailed comparison of all four
smoothing types considered above, see SmoothingComparison.html.
When should you smooth a signal?
are two reasons to smooth a signal:
(a) for cosmetic reasons, to prepare a nicer-looking or more dramatic graphic of a signal for visual inspection or publications, specifically in order to emphasize long-term behavior over short-term, or
(b) if the signal will be subsequently analyzed by a method that would be degraded by the presence of too much high-frequency noise in the signal, for example if the heights of peaks are to be determined visually or graphically or by using the MAX function, or if the location of maxima, minima, or inflection points in the signal is to be determined automatically by detecting zero-crossings in derivatives of the signal. Optimization of the amount and type of smoothing is very important in these cases (see Differentiation.html#Smoothing). But generally, if a computer is available to make quantitative measurements, it's better to use least-squares methods on the unsmoothed data, rather than graphical estimates on smoothed data. If a commercial instrument has the option to smooth the data for you, it's best to disable smoothing that and record the unsmoothed data; you can always smooth it later yourself for visual presentation and it will be better to use the unsmoothed data for an least-squares fitting or other processing that you may want to do later. Smoothing can be used to locate peaks but it should not be used to measure peaks.
Care must be used in the design of algorithms that employ smoothing. For example, in a popular technique for peak finding and measurement, peaks are located by detecting downward zero-crossings in the smoothed first derivative, but the position, height, and width of each peak is determined by least-squares curve-fitting of a segment of original unsmoothed data in the vicinity of the zero-crossing. That way, even if heavy smoothing is necessary to provide reliable discrimination against noise peaks, the peak parameters extracted by curve fitting are not distorted by the smoothing.
When should you NOT smooth a
common situation where you should not
smooth signals is prior to statistical procedures such
curve fitting, because:
(a) smoothing will not significantly improve the accuracy of parameter measurement by least-squares measurements between separate independent signal samples,
(b) all smoothing algorithms are at least slightly "lossy", entailing at least some change in signal shape and amplitude,
(c) it is harder to evaluate the fit by inspecting the residuals if the data are smoothed, because smoothed noise may be mistaken for an actual signal, and
(d) smoothing the signal will seriously underestimate the parameters errors predicted by propagation-of-error calculations and the bootstrap method.
with spikes and outliers. Sometimes
signals are contaminated with very tall, narrow “spikes” or
"outliers" occurring at random intervals and with random
amplitudes, but with widths of only one or a few points. It
not only looks ugly, but it also upsets the assumptions of
least-squares computations because it is not normally-distributed
random noise. This type of interference is difficult to
eliminate using the above smoothing methods without
distorting the signal. However, a “median” filter, which
replaces each point in the signal with the median
(rather than the average) of m adjacent points, can
completely eliminate narrow spikes with little change in the
signal, if the width of the spikes is only one or a few
points and equal to or less than m. See http://en.wikipedia.org/wiki/Median_filter.
The killspikes.m function is
another spike-removing function that uses a different
approach, which locates and eliminates the spikes and
patches over them using linear interpolation from the
signal before and after. Unlike conventional smooths, these
functions can be profitably applied prior to
least-squares fitting functions. (On the other hand, if it's
the spikes that are actually the signal of
interest, and other components of the signal are interfering
with their measurement, see CaseStudies.html#G).
An alternative to smoothing to reduce noise in the above set of unsmoothed signals is ensemble averaging, which can be performed in this case very simply by the Matlab/Octave code plot(x,mean(y)); the result shows a reduction in white noise by about sqrt(10)=3.2. This is enough to judge that there is a single peak with Gaussian shape, which can best be measured by curve fitting (covered in a later section) using the Matlab/Octave code peakfit([x;mean(y)],0,0,1), with the result showing excellent agreement with the position, height, and width of the Gaussian peak created in the third line of the generating script (above left).
Condensing oversampled signals. Sometimes signals are recorded more densely (that is, with smaller x-axis intervals) than really necessary to capture all the important features of the signal. This results in larger-than-necessary data sizes, which slows down signal processing procedures and may tax storage capacity. To correct this, oversampled signals can be reduced in size either by eliminating data points (say, dropping every other point or every third point) or by replacing groups of adjacent points by their averages. The later approach has the advantage of using rather than discarding extraneous data points, and it acts like smoothing to provide some measure of noise reduction. (If the noise in the original signal is white, and the signal is condensed by averaging every n points, the noise is reduced in the condensed signal by the square root of n, but with no change in frequency distribution of the noise).
Video Demonstration. This 18-second, 3 MByte video (Smooth3.wmv) demonstrates the effect of triangular smoothing on a single Gaussian peak with a peak height of 1.0 and peak width of 200. The initial white noise amplitude is 0.3, giving an initial signal-to-noise ratio of about 3.3. An attempt to measure the peak amplitude and peak width of the noisy signal, shown at the bottom of the video, are initially seriously inaccurate because of the noise. As the smooth width is increased, however, the signal-to-noise ratio improves and the accuracy of the measurements of peak amplitude and peak width are improved. However, above a smooth width of about 40 (smooth ratio 0.2), the smoothing causes the peak to be shorter than 1.0 and wider than 200, even though the signal-to-noise ratio continues to improve as the smooth width is increased. (This demonstration was created in Matlab 6.5.
has published a Savitzky-Golay
smooth function in Matlab, which you can download from the Matlab
File Exchange. It's included in the iSignal function.
Pittam has published a modification of the fastsmooth
function that tolerates NaNs (Not a Number) in the data file
and a version for smoothing angle data (nanfastsmoothAngle(Y,w,type,tol)).
This effect is explored more completely by the text below, which shows an experiment in Matlab or Octave that creates a Gaussian peak, smooths it, compares the smoothed and unsmoothed version, then uses the peakfit.m function (version 3.4 or later) to show that smoothing reduces the peak height (from 1 to 0.786) and increases the peak width (from 1.66 to 2.12), but has no effect on the total peak area (as long as you measure the total area under the broadened peak). Smoothing is useful if the signal is contaminated by non-normal noise such as sharp spikes or if the peak height, position, or width are measured by simple methods, but there is no need to smooth the data if the noise is white and the peak parameters are measured by least-squares methods, because the results obtained on the unsmoothed data will be more accurate (see CurveFittingC.html#Smoothing).>> x=[0:.1:10]';
The Matlab/Octave user-defined function medianfilter.m, medianfilter(y,w), performs a median-based filter operation that replaces each value of y with the median of w adjacent points (which must be a positive integer).
ProcessSignal is a Matlab/Octave command-line function that performs smoothing and differentiation on the time-series data set x,y (column or row vectors). It can employ all the types of smoothing described above. Type "help ProcessSignal". Returns the processed signal as a vector that has the same shape as x, regardless of the shape of y. The syntax is Processed=ProcessSignal(x, y, DerivativeMode, w, type, ends, Sharpen, factor1, factor2, SlewRate, MedianWidth)
iSignal is an interactive function for Matlab that performs smoothing for time-series signals using all the algorithms discussed above, including the Savitzky-Golay smooth, a median filter, and a condense function, with keystrokes that allow you to adjust the smoothing parameters continuously while observing the effect on your signal instantly, making it easy to observe how different types and amounts of smoothing effect noise and signal (such as the height, width, and areas of peaks). Other functions include differentiation, peak sharpening, interpolation, least-squares peak measurement, and a frequency spectrum mode that shows how smoothing and other functions can change the frequency spectrum of your signals. The simple script “iSignalDeltaTest” demonstrates the frequency response of iSignal's smoothing functions by applying them to a single-point spike, allowing you to change the smooth type and the smooth width to see how the the frequency response changes. View the code here or download the ZIP file with sample data for testing.
iSignal for Matlab. Click to view larger figures.
you can right-click on any of the m-file links on
this site and select Save
to download them to your computer for use within Matlab.
Unfortunately, iSignal does not currently work in Octave.