[Smoothing Algorithms] [Noise Reduction] [End Effects] [Examples] [Optimization] [When should you smooth a signal?] [When should you NOT smooth a
signal?] [Video Demonstration] [Spreadsheets] [Matlab/Octave] [Have
a question? Email me]

In many experiments in physical science, the true
signal amplitudes (y-axis values) change rather smoothly as a
function of the x-axis values, whereas many kinds of noise are
seen as rapid, random changes in amplitude from point to point
within the signal. In the latter situation it may be useful in
some cases to attempt to reduce the noise by a process called
Smoothing algorithms. Most
smoothing algorithms are based on the "*shift and multiply*"
technique, in which a group of adjacent points in the
original data are multiplied point-by-point by a set of
numbers (coefficients) that defines the smooth shape, the
products are added up to become one point of smoothed data,
then the set of coefficients is shifted one point down the
original data and the process is repeated.
The simplest smoothing algorithm is the *rectangular *
or *unweighted sliding-average smooth*; it simply
replaces each point in the signal with the average of *m*
adjacent points, where *m* is a positive integer
called the *smooth width*. For example, for a 3-point
smooth (*m* = 3):

The *triangular smooth *is like the
rectangular smooth, above, except that it implements a *weighted
*smoothing function. For a 5-point smooth (*m* = 5):

It is often useful to apply a smoothing operation more than once, that is, to smooth an already smoothed signal, in order to build longer and more complicated smooths. For example, the 5-point triangular smooth above is equivalent to two passes of a 3-point rectangular smooth.

In all these smooths, the width of the smooth

Note that we are assuming here that the x-axis intervals of the signal is uniform, that is, that the difference between the x-axis values of adjacent points is the same throughout the signal. This is also assumed in many of the other signal-processing techniques described in this essay, and it is a very common (but not necessary) characteristic of signals that are acquired by automated and computerized equipment.

The Savitzky-Golay smooth is based on the least-squares fitting of polynomials to segments of the data. The algorithm is discussed in http://www.wire.tu-bs.de/OLDWEB/mameyer/cmr/savgol.pdf. Compared to the sliding-average smooths, the Savitzky-Golay smooth is less effective at reducing noise, but more effective at retaining the shape of the original signal. It is capable of differentiation as well as smoothing. The algorithm is more complex and the computational times are greater than the smooth types discussed above, but with modern computers the difference is not significant and code in various languages is widely available online. See SmoothingComparison.html.

Noise reduction. Smoothing usually reduces the noise in a signal. If the noise is "white" (that is, evenly distributed over all frequencies) and its standard deviation is

**End effects and the
lost points problem.** Note in the equations above that
the 3-point rectangular smooth is defined only for j = 2 to
n-1. There is not enough data in the signal to define a
complete 3-point smooth for the first point in the signal (j
= 1) or for the last point (j = n) , because there are no
data points before the first point or after the last point.
(Similarly, a 5-point smooth is defined only for j = 3 to
n-2, and therefore a smooth can not be calculated for the
first two points or for the last two points). In general,
for an *m*-width smooth, there will be (*m*-1)/2
points at the beginning of the signal and (*m*-1)/2
points at the end of the signal for which a complete *m*-width
smooth can not be calculated. What to do? There are two
approaches. One is to accept the loss of points and trim off
those points or replace them with zeros in the smooth
signal. (That's the approach taken in most of the figures in
this paper). The other approach is to use progressively
smaller smooths at the ends of the signal, for example to
use 2, 3, 5, 7... point smooths for signal points 1, 2,
3,and 4..., and for points n, n-1, n-2, n-3...,
respectively. The later approach may be preferable if the
edges of the signal contain critical information, but it
increases execution time. The fastsmooth
function discussed below can utilize
either of these two methods.

Examples of smoothing. A
simple example of smoothing is shown in Figure 4. The left
half of this signal is a noisy peak. The right half is the
same peak after undergoing a triangular smoothing algorithm.
The noise is greatly reduced while the peak itself is hardly
changed. Smoothing increases the signal-to-noise ratio and
allows the signal characteristics (peak position, height,
width, area, etc.) to be measured more accurately by visual
inspection.

*Figure 4. The left half of this signal is a noisy
peak. The right half is the same peak after undergoing a smoothing
algorithm. The noise is greatly reduced while the peak
itself is hardly changed, making it easier to measure the
peak position, height, and width directly by graphical or
visual estimation (but it does not improve measurements made
by least-squares methods; see below).*

The larger the smooth width, the greater the
noise reduction, but also the greater the possibility that
the signal will be *distorted* by the smoothing
operation. The optimum choice of smooth width depends upon
the width and shape of the signal and the digitization
interval. For peak-type signals, the critical factor is the
*smoothing ratio*, the ratio between the smooth width *m*
and the number of points in the half-width of the peak. In
general, increasing the smoothing ratio improves the
signal-to-noise ratio but causes a reduction in amplitude
and in increase in the bandwidth of the peak.

The figures above show examples of the effect of
three different smooth widths on noisy Gaussian-shaped
peaks. In the figure on the left, the peak has a (true)
height of 2.0 and there are 80 points in the half-width of
the peak. The red line is the original unsmoothed peak. The
three superimposed green lines are the results of smoothing
this peak with a triangular smooth of width (from top to
bottom) 7, 25, and 51 points. Because the peak width is 80
points, the *smooth ratios* of these three smooths are
7/80 = 0.09, 25/80 = 0.31, and 51/80 = 0.64, respectively.
As the smooth width increases, the noise is progressively
reduced but the peak height also is reduced slightly. For
the largest smooth, the peak width is slightly increased. In
the figure on the right, the original peak (in red) has a
true height of 1.0 and a half-width of 33 points. (It is
also less noisy than the example on the left.) The three
superimposed green lines are the results of the same three
triangular smooths of width (from top to bottom) 7, 25, and
51 points. But because the peak width in this case is only
33 points, the *smooth ratios* of these three smooths
are larger - 0.21, 0.76, and 1.55, respectively. You can see
that the peak distortion effect (reduction of peak height
and increase in peak width) is greater for the narrower peak
because the smooth ratios are higher. Smooth ratios of
greater than 1.0 are seldom used because of excessive peak
distortion. Note that even in the worst case, the peak
positions are not effected (assuming that the original peaks
were symmetrical and not overlapped by other peaks). If
retaining the shape of the peak is more important than
optimizing the signal-to-noise ratio, the Savitzky-Golay has
the advantage over sliding-average smooths.

It's important to point out that smoothing results
such as illustrated in the figures above may be deceptively optimistic
because they employ a single
sample of a noisy signal that is smoothed to
different degrees. Smoothing is essentially a type of
low-pass filtering that reduces the *high-frequency *
components of a signal while retaining the l*ow-frequency*
components. This causes the viewer to overestimate the
quality of a smoothed noisy signal, because one tends to underestimate the
contribution of the remaining low-frequency noise,
which is hard to estimate visually because there are so few
low-frequency cycles in the signal record. This error can be
remedied by taking a number of independent samples of noisy
signal, as illustrated in the two figures below, which show
10 superimposed plots of a noisy peak, unsmoothed on the
left and smoothed on the right. Inspection of the smoothed
signals clearly shows the variation in peak position,
height, and width caused by the low frequency noise
remaining in the smoothed signals. The Matlab scripts are
shown in the figure titles.

The figure on the right below is another example signal that illustrates some of these principles. The signal consists of two Gaussian peaks, one located at x=50 and the second at x=150. Both peaks have a peak height of 1.0 and a peak half-width of 10, and a normally-distributed random white noise with a standard deviation of 0.1 has been added to the entire signal. The x-axis sampling interval, however, is different for the two peaks; it's 0.1 for the first peaks and 1.0 for the second peak. This means that the first peak is characterized by ten times more points that the second peak. It may look like the first peak is noisier than the second, but that's just an illusion; the signal-to-noise ratio for both peaks is 10. The second peak looks less noisy only because there are fewer noise samples there and we tend to underestimate the dispersion of small samples. The result of this is that when the signal is smoothed, the second peak is much more likely to be distorted by the smooth (it becomes shorter and wider) than the first peak. The first peak can tolerate a much wider smooth width, resulting in a greater degree of noise reduction. (Similarly, if both peaks are measured with the peakfit method, the results on the first peak will be about 3 times more accurate than the second peak, because there are 10 times more data points in that peak, and the measurement precision improves roughly with the square root of the number of data points if the noise is white). You can download the data file "udx" in TXT format or in Matlab MAT format.

Optimization of smoothing. Which is the best smooth ratio? It depends on the purpose of the peak measurement. If the objective of the measurement is to measure the true peak height and width, then smooth ratios below 0.2 should be used. (In the example on the left above, the original peak (red line) has a peak height greater than the true value 2.0 because of the noise, whereas the smoothed peak with a smooth ratio of 0.09 has a peak height that is much closer to the correct value). Measuring the height of noisy peaks is much better done by curve fitting the unsmoothed data rather than by taking the maximum of the smoothed data (see CurveFittingC.html#Smoothing). But if the objective of the measurement is to measure the peak position (x-axis value of the peak), much larger smooth ratios can be employed if desired, because smoothing has no effect at all on the peak position (unless the increase in peak width is so much that it causes adjacent peaks to overlap).

In quantitative analysis applications, the peak
height reduction caused by smoothing is not so important,
because in most cases calibration is based on the signals of
standard samples. If the same
signal processing operations are applied to the samples and
to the standards, the peak height reduction of the standard
signals will be exactly the same as that of the sample
signals and the effect will cancel out exactly. In such
cases smooth widths from 0.5 to 1.0 can be used if necessary
to further improve the signal-to-noise ratio. In practical
analytical chemistry, absolute peak height measurements are
seldom required; calibration against standard solutions is
the rule. (Remember: the objective of quantitative
analysis is not to measure a signal but rather to measure
the concentration of the analyte.) It is very important,
however, to apply *exactly* the same signal processing
steps to the standard signals as to the sample signals,
otherwise a large systematic error may result.

For a comparison of all four smoothing types
considered above, see SmoothingComparison.html.

When should you smooth a signal? There are two reasons to smooth a signal: (1) for cosmetic reasons, to prepare a nicer-looking graphic of a signal for visual inspection or publication, and (2) if the signal will be subsequently processed by an algorithm that would be adversely effected by the presence of too much high-frequency noise in the signal, for example if the heights of peaks are to be determined graphically or by using the MAX function, or if the location of maxima, minima, or inflection points in the signal is to be automatically determined by detecting zero-crossings in derivatives of the signal. Optimization of the amount and type of smoothing is very important in these cases (see Differentiation.html#Smoothing).

Care must be used in the design of algorithms that employ smoothing. For example, in a popular technique for peak finding and measurement, peaks are located by detecting downward zero-crossings in the smoothed first derivative, but the position, height, and width of each peak is determined by least-squares curve-fitting of a segment of original unsmoothed data in the vicinity of the zero-crossing. Thus, even if heavy smoothing is necessary to provide reliable discrimination against noise peaks, the peak parameters extracted by curve fitting are not distorted by the smoothing.

When should you NOT smooth a
signal? One common situation where you should not
smooth signals is prior to statistical procedures such
as least-squares
curve fitting, because: (a) smoothing will not
significantly improve the accuracy of parameter measurement
by least-squares measurements between separate independent
signal samples; (b) all smoothing algorithms are
at least slightly "lossy", entailing at least some
change in signal shape and amplitude, (c) it is harder to
evaluate the fit by inspecting the residuals if the data are
smoothed, because smoothed
noise may be mistaken for an actual signal, and (d)
smoothing the signal will seriously underestimate the
parameters errors predicted by propagation-of-error
calculations and the bootstrap method.
Smoothing can be used to *locate *peaks
but it should not be used to *measure *peaks.

**Dealing
with spikes. ** Sometimes signals are contaminated with
very tall, narrow “spikes” occurring at random intervals and
with random amplitudes, but with widths of only one or a few
points. It not only looks ugly, but it also upsets the
assumptions of least-squares computations because it is not
normally-distributed random noise. This type of interference
is difficult to eliminate using the above smoothing methods
without distorting the signal. However, a “median” filter,
which replaces each point in the signal with the *median*
(rather than the average) of *m* adjacent points, can
completely eliminate narrow spikes with little change in the
signal, if the width of the spikes is only one or a few
points and equal to or less than *m*. See http://en.wikipedia.org/wiki/Median_filter.

Condensing
oversampled signals. Sometimes signals are recorded
more densely (that is, with smaller x-axis intervals) than
really necessary to capture all the important features of
the signal. This results in larger-than-necessary data
sizes, which slows down signal processing procedures and may
tax storage capacity. To correct this, oversampled
signals can be reduced in size either by eliminating data
points (say, dropping every other point or every third
point) or by replacing groups of adjacent points by their
averages. The later approach has the advantage of using rather than discarding extraneous
data points, and it acts like smoothing to provide some
measure of noise reduction. (If the noise in the original
signal is white, and the signal is condensed by averaging
every n points,
the noise is reduced in the condensed signal by the square
root of n, but
with *no change* in frequency distribution of the
noise).

**Video Demonstration.**
This 18-second, 3 MByte video (Smooth3.wmv)
demonstrates the effect of triangular smoothing on a single
Gaussian peak with a peak height of 1.0 and peak width of
200. The initial white noise amplitude is 0.3, giving an
initial signal-to-noise ratio of about 3.3. An attempt to
measure the peak amplitude and peak width of the noisy
signal, shown at the bottom of the video, are initially
seriously inaccurate because of the noise. As the smooth
width is increased, however, the signal-to-noise ratio
improves and the accuracy of the measurements of peak
amplitude and peak width are improved. However, above a
smooth width of about 40 (smooth ratio 0.2), the smoothing
causes the peak to be shorter than 1.0 and wider than 200,
even though the signal-to-noise ratio continues to improve
as the smooth width is increased. (This demonstration was
created in Matlab 6.5.

SPECTRUM, the freeware Macintosh signal-processing application, includes rectangular and triangular smoothing functions for any number of points.

Spreadsheets. Smoothing can be done in spreadsheets using the "shift and multiply" technique described above. In the spreadsheets smoothing.ods and smoothing.xls the set of multiplying coefficients is contained in the formulas that calculate the values of each cell of the smoothed data in columns C and E. Column C performs a 7-point rectangular smooth (1 1 1 1 1 1 1) and column E does a 7-point triangular smooth (1 2 3 4 3 2 1), applied to the data in column A. You can type in (or Copy and Paste) any data you like into column A, and you can extend the spreadsheet to longer columns of data by dragging the last row of columns A, C, and E down as needed. But to change the smooth width, you would have to change the equations in columns C or E and copy the changes down the entire column. It's common practice to divide the results by the sum of the coefficients so that the net gain is unity and the area under the curve of the smoothed signal is preserved. The spreadsheets UnitGainSmooths.xls and UnitGainSmooths.ods contain a collection of unit-gain convolution coefficients for rectangular and triangular smooths of width 3 to 29 that you can Copy and Paste into your own spreadsheets.

The spreadsheets MultipleSmoothing.xls and MultipleSmoothing.ods demonstrate a more flexible method in which the coefficients are contained in a group of 17 adjacent cells (in row 5, columns I through Y), making it easier to change the smooth shape and width (up to a maximum of 17). In this spreadsheet, the smooth is applied three times in succession, resulting in an effective smooth width of 49 points applied to column G.

Compared to Matlab/Octave, spreadsheets are much slower, less flexible, and less easily automated. For example, in these spreadsheets, to change the signal or the number of points in the signal, or to change the smooth width or type, you have to modify the spreadsheet in several spaces, whereas to do the same using the Matlab/Octave "fastsmooth" function (below), you need only change in input arguments of a single line of code. And combining several different techniques into one spreadsheet is more complicated than writing a Matlab/Octave script that does the same thing.

Diederick
has published a Savitzky-Golay
smooth function in Matlab, which you can download from the Matlab
File Exchange. It's included in the iSignal function.

Here's a simple experiment in Matlab or Octave that creates a Gaussian peak, smooths it, compares the smoothed and unsmoothed version, then uses the peakfit.m function (version 3.4 or later) to show that smoothing reduces the peak height (from 1 to 0.786) and increases the peak width (from 1.66 to 2.12), but has little effect on the total peak area (a mere 0.2% change). Smoothing is useful if the signal is contaminated by non-normal noise such as sharp spikes or if the peak height, position, or width are measured by simple methods, but there is no need to smooth the data if the noise is white and the peak parameters are measured by least-squares methods, because the results obtained on the unsmoothed data will be more accurate (see CurveFittingC.html#Smoothing).

>> x=[0:.1:10]';>> y=exp(-(x-5).^2);

>> plot(x,y)

>> ysmoothed=fastsmooth(y,11,3,1);

>> plot(x,y,x,ysmoothed,'r')

>> [FitResults,FitError]=peakfit([x y])

FitResults =

Peak# Position Height Width Area

1 5 1 1.6651 1.7725

FitError =

3.817e-005

>> [FitResults,FitError]=peakfit([x ysmoothed])

FitResults =

1 5 0.78608 2.1224 1.7759

FitError =

0.13409

The Matlab/Octave user-defined function condense.m, condense(y,n), returns a condensed version of y in which each group of n points is replaced by its average, reducing the length of y by the factor n. (For x,y data sets, use this function on both independent variable x and dependent variable y so that the features of y will appear at the same x values).

The Matlab/Octave user-defined function medianfilter.m, medianfilter(y,w), performs a median-based filter operation that replaces each value of y with the median of w adjacent points (which must be a positive integer).

ProcessSignal is a Matlab/Octave command-line function that performs smoothing and differentiation on the time-series data set x,y (column or row vectors). It can employ all the types of smoothing described above. Type "help ProcessSignal". Returns the processed signal as a vector that has the same shape as x, regardless of the shape of y. The syntax is Processed=ProcessSignal(x, y, DerivativeMode, w, type, ends, Sharpen, factor1, factor2, SlewRate, MedianWidth)

iSignal is an interactive function for Matlab that performs smoothing for time-series signals using all the algorithms discussed above, including the Savitzky-Golay smooth, with keystrokes that allow you to adjust the smoothing parameters continuously while observing the effect on your signal instantly. Version 2.2 also includes a median filter and a condense function. Other functions include differentiation, peak sharpening, and least-squares peak measurement. View the code here or download the ZIP file with sample data for testing.

iSignal for Matlab. Click to view larger figures.

Note:
you can right-click on any of the m-file links on
this site and select Save
Link As...
to download them to your computer for use within Matlab.
Unfortunately, iSignal does not currently work in Octave.

Last updated February, 2014. This page is maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu.

Unique visits since May 17, 2008: