[Area vs height] [Historical methods] [perpendicular drop] [Spreadsheets] [Matlab/Octave] [measurepeaks.m] [Baseline
correction] [Broadening and
asymmetry] [Interactive
tools]

The symbolic integration of functions and the calculation of
definite integrals are topics that are introduced in elementary
Calculus courses. The numerical integration of digitized signals
finds application in analytical signal processing mainly as a method
for measuring the areas under the curves of peak-type signals. Peak area measurements are very important in chromatography, a class of chemical measurement techniques in which a mixture of components is made to flow through a chemically-prepared tube or layer that allows some of the components in the mixture to travel faster than others, followed by a device called a

If the detector response is linear with respect to the concentration of the material only at

On the other hand, peak height measurements are

Chromatographic peaks are often described as a Gaussian function or as a

Before computers, there were several methods used to compute peak areas that sound strange by today's standards:

(But now that computing power is built into or connected to every measuring instrument, more accurate and convenient digital methods can be employed. However it is measured, thea) plot the signal on a paper chart, cut out the peak with scissors, then weigh the cut out piece on a micro-balance compared to a square section of known area;

(b) count the grid squares under a curve recorded on gridded graph paper,

(c) use a mechanical ball-and-disk integrator,

(d) use geometry to compute the area under a triangle constructed with its sides tangent to the sides of the peak, or

(e) compute the cumulative sum of the signal magnitude and measure the heights of the resulting steps (see figure below).

The best method for calculating the area under a peak depends whether the peak is isolated or overlapped with other peaks or superimposed on a non-zero baseline or not. The simple numeric integration of a digital signal, for example by Simpson's rule, will convert a series of peaks into a series of steps, the height of each of which is proportional to the area under that peak. But this works well only if the peaks are well separated from each other and if the baseline is zero. This is a commonly used method in proton NMR spectroscopy, where the area under each peak or multiplet is proportional to the number of equivalent hydrogen atoms responsible for that peak.

The
classical way to handle the overlapping peak problem is to draw
two vertical lines from the left and right bounds of the peak down
to the x-axis and then to measure the total area bounded by the
signal curve, the x-axis (y=0 line), and the two vertical lines,
shown the the shaded area in the figure on the left, below. This
is often called the *perpendicular drop* method; it's an
easy task for a computer, although tedious to do by hand. The left
and right bounds of the peak are usually taken as the valleys
(minima) between the peaks or as the point half-way between the
peak center and the centers of the peaks to the left and right.
Using this method it is possible to estimate the area of the
second peak to an accuracy of about 0.3% and the last two peaks to
an accuracy of better than 4%. (A slight improvement in the
accuracy of the measured areas of the third and fourth peaks can
be obtained by applying the peak
sharpening technique to narrow the peaks before the
perpendicular drop measurement).

*Peak
area measurement for overlapping peaks, using**
the perpendicular drop method (left, shaded area)*

However, that simple method works well only if the peaks are
symmetrical, not too different in height, not too highly
overlapped (as is the case for the first two peaks in this
example), and not superimposed on a background whose area is not
to be included. In the case where a peak is superimposed on a
straight or broadly curved baseline, you might use the *tangent
skim method*, which measures the area between the curve and a
linear baseline drawn across the bottom of the peak (e.g. the *shaded
area* in the figure on the right, above). In general, the
hardest part of the problem and the greatest source of uncertainty
is determining the shape of the baseline under the peaks and
determining when each peaks begins and ends. Once those are
determined, you subtract the baseline from each point between the
start and end points, add them up, and multiply by the x-axis
interval. Incidentally, smoothing a noisy signal does not change
the areas under the peaks, but it may make the peak start and stop
points easier to determine. The downside of smoothing is that
increases peak width and the overlap between adjacent peaks.
Numerical methods of peak sharpening, for example derivative sharpening and
Fourier deconvolution, can help
with the problem of peak overlap, and both of these techniques
have the useful property that they do not change the area under
the peaks

If the *shape *of peaks is known, the best way to measure
the areas of overlapping peaks is to use some type of
least-squares curve fitting, as is discussed in the three
following sections (A, B, C).
If the peak positions, widths, and amplitudes are unknown,
and only the fundamental peak shapes are known, then the iterative least-squares method can
be employed. In many cases, even the background can be accounted
for by curve fitting.

For
gas chromatography and mass spectrometry specifically, **Philip
Wenig's OpenChrom** is an *open source*
data system that can import binary and textual chromatographic
data files directly. It includes methods to detect baselines and
to measure peak areas in a chromatogram. Extensive documentation
is available. It is available for Windows, Linux, Solaris and
Mac OS X. A screen shot is shown on the left (click to
enlarge). The program and its documentation is regularly updated
by the author.

Another freely-available open-source program for mass
spectroscopy is "Skyline"
from MacCoss Lab Software,
which is specifically aimed at reaction monitoring. Tutorials and
videos are available.

SPECTRUM, the freeware signal-processing application for Macintosh OS8, includes an integration function, as well as peak area measurement by perpendicular drop or tangent skim methods, with mouse-controlled setting of start and stop points.

Peak area measurement using spreadsheets.

EffectOfDx.xlsx (screen image) demonstrates that the simple equation sum(y)*dx accurately measures the peak area of an isolated Gaussian peak if there are at least 4 or 5 points visibly above the baseline and as long as you include the points out to plus and minus at least 2 or 3 standard deviations of the Gaussian. It also shows that an exponentially broadened Gaussian needs to include more points on the tailing (right-hand, in this case) side to achieve the best accuracy. EffectOfNoiseAndBaseline.xlsx (screen image) demonstrates the effect of random noise and non-zero baseline, showing that the area is more sensitive to non-zero baseline that the same amount of random noise.

CumulativeSum.xls (screen image) illustrates integration of a peak-type signal by normalized cumulative sum; you can paste your own data into columns A and B. CumulativeSumExample.xls is an example with data . The

EffectOfNoiseAndBaselineNormalVsPower.xlsx (screen image) demonstrates the effect of the power sharpening method on area measurements of Gaussian and exponentially broadened Gaussian peaks, including the different effect that random noise and non-zero baseline. It shows that higher values of power (cell O9) reduce the peak width, makes the EMG peak more Gaussian, reduces the effect of the background (cell B6), and reduces the noise (cell B5) on the baseline but

Peak area measurement using Matlab and Octave.

Matlab and Octave have built-in commands for the sum of elements (“sum”, and the cumulative sum “cumsum”), trapezoidal numerical integration (“trapz”), and adaptive Simpson quadrature (“quad”). For example, these three Matlab commands

>> x=-5:.1:5;

>> y=exp(-(x).^2);

>> trapz(x,y)

accurately compute the area under the curve of x,y (in this case an isolated Gaussian, whose area is theoretically known to be the square root of pi, sqrt(pi), which is 1.7725. If the interval between x values, dx, is

But the peaks in real signals have some complications:

(a) their shapes might not be known;These must be taken into account to measure accurate areas

(b) they may be superimposed on a baseline; and

(c) they may be overlapped with other peaks.

[M,A]=autopeaks.m is basically a combination or autofindpeaks.m and measurepeaks.m. It has similar syntax to measurepeaks.m, except that the peak detection parameters (SlopeThreshold, AmpThreshold, smoothwidth peakgroup, and smoothtype) can be omitted and the function will calculate trial values in the manner of autofindpeaks.m. Using the simple syntax [M,A]=autopeaks(x, y) works well in some cases, but if not try [M,A]=autopeaks(x, y,

For determining the effect of smoothing, peak sharpening, deconvolution, or other signal enhancement methods on the areas of overlapping peaks measured by the perpendicular drop method, the Matlab/Octave function ComparePDAreas.m uses autopeaks.m to measure the peak areas of original and processed signals, "orig" and "processed", and displays a scatter plot of original vs processed areas for each peak and returns the peak tables, P1 and P2 respectively, and the slope, intercept, and R2 values, which should ideally be 1,0, and 1, if the processing has had no effect at all on peak area.

The Matlab/Octave automatic peak-finding function

iSignal is a downloadable user-defined Matlab function that performs various signal processing functions described in this tutorial, including measurement of peak area using Simpson's Rule and the perpendicular drop method. Click to view or right-click > Save link as... here, or you can download the ZIP file with sample data for testing. It is shown on the left applying the perpendicular drop method to a series of four peaks of equal area. (Look at the bottom panel to see how the measurement intervals, marked by the vertical dotted magenta lines, are positioned at the valley

Here's a bit of Matlab/Octave code that creates four computer-synthesized Gaussian peaks, similar to this figure, that

x=[0:.01:18];

y=exp(-(x-4).^2) + exp(-(x-9).^2) + exp(-(x-12).^2) + exp(-(x-13.7).^2);

isignal(x,y);

To use

Peak # Position Height Width Area

1 4.00 1.00 1.661 1.7725

2 9.001 1.0003 1.6673 1.77

3 12.16 1.068 2.3 1.78

4 13.55 1.0685 2.21 1.79

The area results are reasonably accurate in this example only because the perpendicular drop method roughly compensates for partial overlap between peaks, but only if the peaks are symmetrical, about equal in height, and have zero background.

For example, using the

>> peakfit([x;y],9,18,4,1,0,10,0,0,0)

Peak # Position Height Width Area

1 4 1 1.6651 1.7725

2 9 1 1.6651 1.7725

3 12 1 1.6651 1.7725

4 13.7 1 1.6651 1.7725

>> ipeak([x,y],10)

Peak # Position Height Width Area

1 4 1 1.6651 1.7727

2 9.0005 1.0001 1.6674 1.7754

3 12.16 1.0684 2.2546 2.5644

4 13.54 1.0684 2.2521 2.5615

Peaks 1 and 2 are measured accurately by

Fitting Error 0.0002165%

Peak# Position Height Width Area

1 4 1 1.6651 1.7724

2 9 1 1.6651 1.7725

3 12 1 1.6651 1.7725

4 13.7 0.99999 1.6651 1.7724

Correction for background/baseline. The presence of a baseline or background signal, on which the peaks are superimposed, will greatly influence the measured peak area if not corrected or compensated.

Here's a Matlab/Octave experiment that compares several

iSignal, using perpendicular drop in baseline mode 1, seriously underestimates both peak areas (168.6 and 81.78).

An automated tangent skim measurement by measurepeaks is not accurate in this case because the peaks do not go all the way down to the baseline at the edges of the signal and because of the slight overlap:

An attempt to use curve fitting with

1 500 2.0001 90.005

2 700 0.99999 89.998

3 5740.2 8.7115e-007 1 1200.1

AsymmetricalAreaTest.m is a Matlab/Octave script that compares the accuracy of peak area measurement methods for a single noisy asymmetrical peak measured by different methods: (A) Gaussian estimation,(B) triangulation, (C) perpendicular drop method, and curve fitting by (D) exponentially broadened Gaussian, and (E) two overlapping Gaussians. AsymmetricalAreaTest2.m is similar except that it compares the precision (standard deviation) of the areas. For a single peak with zero baseline, the perpendicular drop and curve fitting methods work equally well, both considerable better than Gaussian estimation or triangulation. The advantage of the curve fitting methods is that they can deal more accurately with peaks that overlap or that are superimposed on a baseline.

Here's a Matlab/Octave experiment that creates a signal containing five Gaussian peaks with the

>> x=5:.1:65;

>> y=modelpeaks2(x, [1 5 5 5 5], [1 1 1 1 1], [10 20 30 40 50], [3 3 3 3 3], [0 -5 -10 -15 -20]);

>> plot(x,y)

The theoretical area under these Gaussians is

As the broadening is increased from left to right, the peak height

2

3

4

5

The triangle construction method (using

2

3

4

5

The automated function measurepeaks.m gives better results using the perpendicular drop method (5th column of table).

Peak Position PeakMax Peak-val. Perp drop Tan skimUsing

1 10 1 .99047 3.1871 3.11232 20.4 .94018 .92897 3.1839 3.0905

330.709 .83756 .81805 3.1597 2.97944 40.93 .74379 .70762 3.1188 2.76345 51.095 .66748 .61043 3.0835 2.5151

But we can obtain a more accurate automated measurement of all five peaks, using

The fitting error is not much better than the simple Gaussian fit. Better results can be had using preliminary position and width results obtained from the findpeaks function or by curve fitting with a simple Gaussian fit and using those results as the "start" vector:

Even more accurate results for area are obtained using peakfit with one Gaussian and four

The latter approach works because, although the

The

Alternatively, if the objective is only to measure the peak

Next, we make a

>> [FitResults,FittingError]=peakfit([x;y],30,54,5, [1 8 8 8 8], [0 -5 -10 -15 -20] ,20, [20 3.5 25 3.5 31 3.5 36 3.5 41 3.5],0)

FitResults =

1 19.999 1.0015 2.9978 3.1958

2 25.001 1.9942 3.0165 6.4034

3 30 3.0056 2.9851 9.5507

4 34.997 3.9918 3.0076 12.78

5 40.001 4.9965 3.0021 15.966

FittingError =

0.2755

The measured areas in this case (last column) are very close to to the theoretical values, whereas all the other methods give substantially poorer accuracy. The more overlap between peaks, and the more unequal are the peak heights, the poorer the accuracy of the perpendicular drop and triangle construction methods. If the peaks are so overlapped that separate maxima are not visible, both methods fail completely, whereas curve fitting can often retrieve a reasonable result, but

This page is part of "

Unique visits since May 17, 2008: