[Introduction]  [Signal arithmetic]  [Signals and noise]   [Smoothing]   [Differentiation]  [Peak Sharpening]  [Harmonic analysis]   [Fourier convolution]  [Fourier deconvolution]  [Fourier filter]   [Peak area measurement]  [Linear Least Squares]  [Multicomponent Spectroscopy]  [Iterative Curve Fitting]  [Hyperlinear quantitative absorption spectrophotometry] [Appendix and Case Studies]  [Peak Finding and Measurement]  [iPeak]   [iSignal]  [Peak Fitters]   [iFilter]  [iPower]  [List of downloadable software]  [Interactive tools]

index previous next

Integration and peak area measurement

[Area vs height]  [Historical methods]  [perpendicular drop]  [Spreadsheets]  [Matlab/Octave]  [measurepeaks.m] [Baseline correction]  [Broadening and asymmetry]  [Interactive tools]

The symbolic integration of functions and the calculation of definite integrals are topics that are introduced in elementary Calculus courses. The numerical integration of digitized signals finds application in analytical signal processing mainly as a method for measuring the areas under the curves of peak-type signals.

Peak area measurements are very important in chromatography, a class of chemical measurement techniques in which a mixture of components is made to flow through a chemically-prepared tube or layer that allows some of the components in the mixture to travel faster than others, followed by a device called a detector that measures and records the components after separation. Ideally, the components are sufficiently separated so that each one forms a distinct peak in the detector signal. The magnitude of the peaks are calibrated to the concentration of that component by measuring the peaks obtained from "standard solutions" of known concentration.  In chromatography it is common to measure the area under the detector peaks rather than the height of the peaks, because peak area is less sensitive to the influence of peak broadening (dispersion) mechanisms that cause the molecules of a specific substance to be be diluted and spread out rather than being concentrated on one "plug" of material as it travels down the column. These dispersion effects, which arise from many sources, cause chromatographic peaks to become shorter, broader, and in some cases more unsymmetrical, but they have little effect on the total area under the peak, as long as the total number of molecules remains the same. If the detector response is linear with respect to the concentration of the material, the peak area remains proportional to the total quantity of substance passing into the detector, even though the peak height is smaller. A graphical example is shown on the left (Matlab/Octave code), which plots detector signal vs time, where the blue curve represents the original signal and the red curve shows the effect of broadening by dispersion effects. The peak height is lower and the width is greater, but the area under the curve is almost exactly the same. If the extent of broadening changes between the time that the standards are run and the time that the unknown samples are run, then peak area measurements will be more accurate and reliable than peak height measurements. (Peak height will be proportional to the quantity of material only if the peak width and shape are constant). Another example with greater broadening: (script and graphic).

If the detector response is linear with respect to the concentration of the material only at low concentrations and becomes non-linear at higher concentrations, peak height measurements will suffer because the peak maximum naturally occurs at the moment when the concentration at the detector is highest. In such cases peak area measurements will still be somewhat non-linear, but less so than peak height measurements, because most of the area of a peak is measured when the concentration of material is less than maximum.

On the other hand, peak height measurements are simpler to make and are less prone to interference by neighboring, overlapping peaks. And a further disadvantage of peak area measurement is that the peak start and stop points must be determined, which may be difficult especially of the peak overlaps other peaks. In principle curve fitting can measure the areas of peaks even then they overlap, but that requires that the shapes of the peaks be known at least approximately (however, see PeakShapeAnalyticalCurve.m described in Appendix U).

Chromatographic peaks are often described as a Gaussian function or as a convolution of a Gaussian with an exponential function. A detailed quantitative comparison of peak height and peak area measurement is given in Appendix L: Why measure peak area rather than peak height? (In spectroscopy, other broadening mechanisms are often encountered, such as Doppler broadening caused by thermal motion, which results in a Gaussian broadening function).

Before computers, there were several methods used to compute peak areas that sound strange by today's standards:
(a) plot the signal on a paper chart, cut out the peak with scissors, then weigh the cut out piece on a micro-balance compared to a square section of known area;
(b) count the grid squares under a curve recorded on gridded graph paper,
(c) use a mechanical ball-and-disk integrator,
(d) use geometry to compute the area under a triangle constructed with its sides tangent to the sides of the peak, or
(e) compute the cumulative sum of the signal magnitude and measure the heights of the resulting steps (see figure below).
But now that computing power is built into or connected to every measuring instrument, more accurate and convenient digital methods can be employed. However it is measured, the units of peak area are the product of the x and y units. Thus, in a chromatogram where the x is time in minutes and y is volts, the area is in volts-minute. In absorption spectrum where the x is nm (nanometers) and y is absorbance, the area is absorbance-nm. Because of this, the numerical magnitude of peak area will always be different from that of the peak height. If one is performing a quantitative analysis of unknown samples by means of a calibration curve, the same method of measurement must be used for both the standards and the samples, and in fact even if the measurements are inaccurate, as long as the error is the same for all standards and samples (which is why an approximate method like triangle construction works).

The best method for calculating the area under a peak depends whether the peak is isolated or overlapped with other peaks or superimposed on a non-zero baseline or not. The simple numeric integration of a digital signal, for example by Simpson's rule, will convert a series of peaks into a series of steps, the height of each of which is proportional to the area under that peak. But this works well only if the peaks are well separated from each other and if the baseline is zero. 
 
This is a commonly used method in proton NMR spectroscopy, where the area under each peak or multiplet is proportional to the number of equivalent hydrogen atoms responsible for that peak.

The classical way to handle the overlapping peak problem is to draw two vertical lines from the left and right bounds of the peak down to the x-axis and then to measure the total area bounded by the signal curve, the x-axis (y=0 line), and the two vertical lines. This is often called the perpendicular drop method; it's an easy task for a computer, although tedious to do by hand. The idea is illustrated for the second peak from the left in the figure. The left and right bounds of the peak are usually taken as the valleys (minima) between the peaks or as the point half-way between the peak center and the centers of the peaks to the left and right. Using this method it is possible to estimate the area of the second peak to an accuracy of about 0.3% and the last two peaks to an accuracy of better than 4%. (A slight improvement in the accuracy of the measured areas of the third and fourth peaks can be obtained by applying the peak sharpening technique to narrow the peaks before the perpendicular drop measurement).

Peak area measurement for overlapping peaks, using
the perpendicular drop method (vertical lines at the bottom)
and tangent skim method (shaded area)

However, that simple method works well only if the peaks are symmetrical, not too different in height, not too highly overlapped (as is the case for the first two peaks in this example), and not superimposed on a background whose area is not to be included. In the case where a peak is superimposed on a straight or broadly curved baseline, you might use the tangent skim method, which measures the area between the curve and a linear baseline drawn across the bottom of the peak (e.g. the shaded area in the figure above). In general, the hardest part of the problem and the greatest source of uncertainty is determining the shape of the baseline under the peaks and determining when each peaks begins and ends. One those are determined, you need only subtract the baseline from each point between the start and end points, add them up, and multiply by the x-axis interval. Incidentally, smoothing a signal does not change the areas under the peaks, but it may make the peak start and stop points easier to determine. The downside of smoothing is that increases peak width and the overlap between adjacent peaks.

If the shape of peaks is known, the best way to measure the areas of overlapping peaks is to use some type of least-squares curve fitting, as is discussed in the three following sections (A, B, C). If the peak positions, widths, and amplitudes are unknown, and only the fundamental peak shapes are known, then the iterative least-squares method can be employed. In many cases, even the background can be accounted for by curve fitting.

For gas chromatography and mass spectrometry specifically, I recommend Philip Wenig's OpenChrom software, an open source data system that can import binary and textual chromatographic data files directly. It includes methods to detect baselines and to measure peak areas in a chromatogram. You can set it to ignore specific "uninteresting" mass fragments (m/z) such as nitrogen (28) or water (18).  Extensive documentation is available. It is available for Windows, Linux, Solaris and Mac OS X.  A screen shot is shown on the left (click to enlarge). The program and its documentation is regularly updated by the author.




SPECTRUM, the freeware signal-processing application for Macintosh OS8, includes an integration function, as well as peak area measurement by perpendicular drop or tangent skim methods, with mouse-controlled setting of start and stop points.

Peak area measurement using spreadsheets.
CumulativeSum.xls is a simple spreadsheet template that illustrates integration of a peak-type signal by normalized cumulative sum; you can paste your own data into columns A and B. CumulativeSumExample.xls is an example with data similar to the figure at the top of this page. The Excel and Calc spreadsheets PeakDetectionAndMeasurement and CurveFitter can measure the areas under partly overlapping Gaussian peaks in time-series data, using the findpeaks algorithm and iterative non-linear curve fitting techniques, respectively. But neither is as versatile as using a dedicated chromatography program such as OpenChrom.

Peak area measurement using Matlab and Octave.
Matlab
and Octave have built-in commands for the sum of elements (“sum”, and the cumulative sum “cumsum”), trapezoidal numerical integration (“trapz”), and adaptive Simpson quadrature (“quad”).  For example, these three Matlab commands 

>> x=-5:.1:5;
>> y=exp(-(x).^2);
>> trapz(x,y)

accurately compute the area under the curve of x,y (in this case an isolated Gaussian, whose area is theoretically known to be the square root of pi sqty(pi), 1.7725). If the interval between x values, dx, is constant, then the area is simply yi=sum(y).*dx. Alternatively, the signal can be integrated using yi=cumsum(y).*dx, then the area of the peak will be equal to the height of the resulting step, max(yi)-min(yi)=1.7725.  The area of a peak is proportional to the product of its height and its width, but the proportionality constant depends on the peak shape. A Gaussian peak with a peak height h and full-width at half-maximum w has an area of 1.0645*h*w. A Lorentzian peak has an area of 1.57*h*w. (This can be confirmed by computing the area of a peak of unit height and width: e.g. for a Lorentzian peak dx=.001; x=0:dx:1500; y=lorentzian(x,750,1); trapz(x,y)). For a peak of any shape on a zero baseline, the peak width (FWHM) can be measured by the Matlab/Octave halfwidth.m function).

But the peaks in real signals have some complications: (a) their shapes might not be known; (b) they may be superimposed on a baseline; and (c) they may be overlapped with other peaks. These must be taken into account to measure accurate areas in experimental signals.

Measurepeaks.m is a Matlab/Octave function that automatically detects peaks in a s
ignal using the derivative zero-crossing method described previously. It shares the first 6 input arguments with findpeaksSG. The syntax is M=measurepeaks(x,y, SlopeThreshold, AmpThreshold, SmoothWidth, FitWidth, plots). It returns a table containing the peak number, peak position, absolute peak height, peak-valley difference, perpendicular drop area, and tangent skim area of each peak it detects. If the last input argument ('plots') is set to 1, it plots the signal with numbered peaks (shown on the left) and also plots the individual peaks (in blue) with the maximum (red circles), valley points (magenta), and tangent lines (cyan) marked as shown on the right. Type “help measurepeaks” and try the seven examples there, or run HeightAndArea.m to test the accuracy of peak height and area measurement with signals that have multiple peaks with noise, background, and some peak overlap. Generally, the values for absolute peak height and perpendicular drop area are best for peaks that have no background, even if they are slightly overlapped, whereas the values for peak-valley difference and for tangential skim area are better for isolated peaks on a straight or slightly curved background. Note: this function uses smoothing (specified by the SmoothWidth input argument) only for peak detection; it performs measurements on the raw unsmoothed y data. If the raw data are noisy, it may be beneficial to smooth the y data yourself before calling measurepeaks.m, using any smooth function of your choice. 

[M,A]=autopeaks.m is basically a combination or autofindpeaks.m and measurepeaks.m. It has similar syntax to measurepeaks.m, except that the peak detection parameters (SlopeThreshold, AmpThreshold, smoothwidth  peakgroup, and smoothtype) can be omitted and the function will calculate trial values in the manner of autofindpeaks.m. Using the simple syntax
[M,A]=autopeaks(x, y) works well in some cases, but if not try [M,A]=autopeaks(x, y, n), using different values of n (roughly the number of  peaks that would fit into the signal record) until it detects the peaks that you want to measure. Like measurepeaks, it returns a table M containing the peak number, peak position, absolute peak height, peak-valley difference, perpendicular drop area, and tangent skim area of each peak it detects, but is also can optionally return a vector A containing the peak detection parameters that it calculates (for use by other peak detection and fitting functions). For the most precise control over peak detection, you can specify all the peak detection parameters by typing M=autopeaks(x,y, SlopeThreshold, AmpThreshold, smoothwidth, peakgroup). [M,A]=autopeaksplot.m is the same but it also plots the signal and the individual peaks in the manner of measurepeaks.m (shown above). The script testautopeaks.m runs all the examples in the autopeaks help file, with a 1-second pause between each one, printing out results in the command window and additionally plotting and numbering the peaks (Figure window 1) and each individual peak (Figure window 2); it requires gaussian.m and fastsmooth.m in the path.

The Matlab/Octave automatic peak-finding function findpeaksG.m computes peak area assuming that the peak peak shape is Gaussian (or Lorentzian, for the variant findpeaksL.m). The related function findpeaksT.m uses the triangle construction method to compute the peak parameters. Even for well-separated Gaussian peaks, the area measurements by the triangle construction method is not very accurate; the results are about 3% below the correct values. (But this method does perform better than findpeaksG.m when the peaks are noticeably asymmetric; see triangulationdemo for some examples). In contrast, measurepeaks.m makes no assumptions about the shape of the peak.

iSignal is a downloadable user-defined Matlab function that performs various signal processing functions described in this tutorial, including measurement of peak area using Simpson's Rule and the perpendicular drop method. Click to view or right-click > Save link as...  here, or you can download the ZIP file with sample data for testing.
It is shown on the left applying the perpendicular drop method to a series of four peaks of equal area. (Look at the bottom panel to see how the measurement intervals, marked by the vertical dotted magenta lines, are positioned at the valley minimum on either side of each of the four peaks).

Here's a bit of Matlab/Octave code that creates
four computer-synthesized Gaussian peaks, similar to this figure, that all have the same height (1.000), width (1.665), and area (1.772) but with different degrees of peak overlap:


x=[0:.01:18];
y=exp(-(x-4).^2) + exp(-(x-9).^2) + exp(-(x-12).^2) + exp(-(x-13.7).^2);
isignal(x,y);

To use iSignal to measure the areas of each of these peaks by the perpendicular drop method, use the pan and zoom keys to position the two outer cursor lines (dotted magenta lines) in the valley on either side of the peak. The total of each peak area will be displayed below the upper window.

Peak #   Position       Height     Width      Area
   1       4.00         1.00       1.661      1.7725
   2       9.001        1.0003     1.6673     1.77
   3      12.16         1.068      2.3        1.78
   4      13.55         1.0685     2.21       1.79

 The area results are reasonably accurate in this example only because the perpendicular drop method roughly compensates for partial overlap between peaks, but only if the peaks are symmetrical, about equal in height, and have zero background.

iSignal version 5.9 includes an additional command (J key) that calls the autopeaksplot function, which automatically detects the peaks in the signal and measures their peak position, absolute peak height, peak-valley difference, perpendicular drop area, and tangent skim area. It asks you to type in the peak density (roughly the number of peaks that would fit into the signal record); the greater this number, the more sensitive it is to narrow peaks. It displays the measured peaks just as does the measurepeaks function described above. (To return to iSignal, press any cursor arrow key).

Area measurement by iterative curve fitting. In general, the most accurate peak area measurements for overlapping peaks, assuming that the basic shape of the peaks is known or can be guessed, are made with iterative least-squares peak fitting, for example using peakfit.m, shown on the right (for Matlab and Octave). This function can fit any number of overlapping peaks with model shapes selected from a list of different types. It uses the "trapz" function to calculate the area of each of the component mode peak.

 For example, using the peakfit function on the same data set as above, the results are much more accurate:

>> peakfit([x;y],9,18,4,1,0,10,0,0,0)

         Peak #    Position      Height       Width        Area
            1            4            1       1.6651       1.7725
            2            9            1       1.6651       1.7725
            3           12            1       1.6651       1.7725
            4         13.7            1       1.6651       1.7725

iPeak can also be used to estimate peak areas. It uses the same Gaussian curve fitting method as iSignal, but it has the advantage that it can detect and measure all the peaks in a signal in one operation.  For example:

>> ipeak([x,y],10)

         Peak #    Position      Height       Width        Area
            1            4            1       1.6651       1.7727
            2       9.0005       1.0001       1.6674       1.7754
            3        12.16       1.0684       2.2546       2.5644
            4        13.54       1.0684       2.2521       2.5615

Peaks 1 and 2 are measured accurately by iPeak, but the peak widths and areas for peaks 3 and 4 are not accurate because of the peak overlap.  Fortunately, iPeak has a built-in "peakfit" function (activated by the N key) that uses these peak position and width estimates as its first guesses, resulting in good accuracy for all four peaks.

Fitting Error 0.0002165%
          Peak#     Position     Height      Width         Area 
            1            4            1       1.6651       1.7724
            2            9            1       1.6651       1.7725
            3           12            1       1.6651       1.7725
            4         13.7      0.99999       1.6651       1.7724


Correction for background/baseline. The presence of a baseline or background signal, on which the peaks are superimposed, will greatly influence the measured peak area if not corrected or compensated. iSignal, iPeak, measurepeaks, and peakfit all have several different baseline correction modes, for flat, linear, and quadratic baselines, and iSignal and iPeak have a multipoint piece-wise linear baseline subtraction function allows the manually estimated background to be subtracted from the entire signal.  See iSignal.html#background_subtraction, ipeakdemo1 on PeakFindingandMeasurement.htm#demos, and CurveFittingC.html#Background_correction for examples of these background correction functions. If the baseline is actually caused by the edges of a strong overlapping adjacent peak, then it's possible to include that peak in the curve-fitting operation, as see in Example 22 on InteractivePeakFitter.htm.

Here's a Matlab/Octave experiment that compares several different methods of baseline correction in peak area measurement. The signal consists of two noiseless, slightly overlapping Gaussian peaks with theoretical peak heights of 2.00 and 1.00 and areas of 191.63 and 95.81 units, respectively. The baseline is tilted and linear, and slightly greater in magnitude than the peak heights themselves, but the most serious problem is that the signal never returns to the baseline long enough to make it easy to distinguish the signal from the baseline.

>> x=400:1:800;y=2.*gaussian(x,500,90)+1.*gaussian(x,700,90)+2.*(x./400);

iSignal, using perpendicular drop in baseline mode 1, seriously underestimates both peak areas (168.6 and 81.78). 



An automated tangent skim measurement by measurepeaks is not accurate in this case because the peaks do not go all the way down to the baseline at the edges of the signal and because of the slight overlap:

>> measurepeaks(x,y,.0001,.8,2,5,1)
    Position  PeakMax  Peak-valley  Perp drop  Tan skim
1   503.67    4.5091     1.895       672.29     171.44
2   707.44    4.5184    0.8857       761.65     76.685

An attempt to use curve fitting with peakfit.m in the flat baseline correction mode 3 - peakfit([x;y],0,0,2,1,0,1,0,3), above, left-most figure - does not really work because the actual baseline is tilted, not flat. The linear baseline mode does a little better (peakfit([x;y],0,0,2,1,0,1,0,1), second figure from left) but it's not perfect in this case. A more accurate approach is to fit the baseline as a third "peak" of a different shape, with either a Lorentzian model - peakfit([x;y],0,0,3,[1 1 2]), third signal from the left - or with a "slope" model - shape 26 in peakfit version 6, last figure on the right. The latter method gives both the lowest fitting error (less than 0.01%) and the most accurate peak areas (less than ˝% error in peak area):

>> [FitResults,FitError]=peakfit([x;y],0,0,3,[1 1 26])
FitResults =           
            1          500       2.0001       90.005       190.77   
            2          700      0.99999       89.998       95.373
            3       5740.2  8.7115e-007            1       1200.1

FitError =0.0085798

Note that in this last case the number of peaks is 3 and the shape argument is a vector [1 1 26] specifying two Gaussian components plus the "linear slope" shape 26. If the baseline seems to be non-linear, you might prefer to model it using a quadratic (shape 46; see example 38 on InteractivePeakFitter.htm#Examples). If the baseline seems to be different on either side of the peak, try modeling the baseline with an S-shape (sigmoid), either an up-sigmoid, shape 10 (click for graphic), peakfit([x;y],0,0,2,[1 10],[0 0]), or a down-sigmoid, shape 23 (click for graphic), peakfit([x;y],0,0,2,[1 23],[0 0]), in these examples leaving the peak modeled as a Gaussian.
Asymmetrical peaks and peak broadening: perpendicular drop vs curve fitting.

Here's a Matlab/Octave experiment that creates a signal containing five Gaussian peaks with the same initial peak height (1.0) and width (3.0) but which are subsequently broadened by increasing degrees of exponential broadening, similar to the broadening of peaks commonly encountered in chromatography:

>> x=5:.1:65;
>> y=modelpeaks2(x, [1 5 5 5 5], [1 1 1 1 1], [10 20 30 40 50], [3 3 3 3 3],
[0 -5 -10 -15 -20]);
>> plot(x,y)

The theoretical area under these Gaussians is all the same: 1.0645*Height*Width = 1*3*1.0645= 3.1938. A perfect area-measuring algorithm would return this number for all five peaks.

As the broadening is increased from left to right, the peak height decreases (by about 35%) and peak width increases (by about 32%). But because the area under the peak is proportional to the product of the peak height and the peak width, these two changes approximately cancel each other out and the result is that the peak area is nearly independent of peak broadening (see the summary of results in 5ExponentialBroadenedGaussianFit.xlsx). The Matlab/Octave peak-finding function findpeaksG.m, finds all five peaks and measures their areas assuming a Gaussian shape; this works well for the unbroadened peak 1 (script), but it underestimates the areas as the broadening increases in peaks 2-5:

    Peak   Position    Height     Width     Area
    1       10.0000    1.0000    3.0000    3.1938
    2
      20.4112    0.9393    3.1819    3.1819
    3
      30.7471    0.8359    3.4910    3.1066
    4
      40.9924    0.7426    3.7786    2.9872
    5
      51.1759    0.6657    4.0791    2.8910

The triangle construction method (using
findpeaksT.m) underestimates even the area of the unbroadened peak 1 and is less accurate for the broadened peaks (script; graphic):

    Peak   Position    Height     Width     Area
    1       10.0000    1.1615    2.6607    3.0905
    2
      20.3889    1.0958    2.8108    3.0802
    3
      30.6655    0.9676    3.1223    3.0210
    4
      40.8463    0.8530    3.4438    2.9376
    5
      50.9784    0.7563    3.8072    2.8795

The automated function measurepeaks.m gives better results using the perpendicular drop method (5th column of table).
 M=measurepeaks(x,y,0.0011074,0.10041,3,3,1)
  Peak Position PeakMax  Peak-val. Perp drop Tan skim
1   10        1      .99047    3.1871    3.1123
2    20.4      .94018  .92897    3.1839    3.0905 
3
30.709    .83756  .81805    3.1597    2.9794
4    40.93     .74379  .70762    3.1188    2.7634
5    51.095    .66748  .61043    3.0835    2.5151
Using iSignal and the manual peak-by-peak perpendicular drop method yields areas of 3.193, 3.194, 3.187, 3.178, and 3.231, a mean of  3.1966 (pretty close to the theoretical value of 3.1938) and standard deviation of only 0.02 (0.63%). Alternatively, integrating the signal, cumsum(y).*dx), where dx is the difference between adjacent x-axis values (0.1 in this case), and then measuring the heights of the resulting steps, gives similar results: 3.19, 3.19, 3.18, 3.17, 3.23. By either method, the peak heights are very different but the areas are closer together, yet not exactly equal.

But we can obtain a more accurate automated measurement of all five peaks, using peakfit.m with multiple shapes, one Gaussian and four exponentially modified Gaussians (shape 5) with different exponential factors (extra vector):
>> [FitResults,FittingError]=peakfit([x;y],30,54,5,[1 5 5 5 5],[0  -5 -10 -15 -20],10, 0, 0)
FitResults =
         Peak#   Position       Height       Width         Area
            1        9.9933     0.98051       3.1181       3.2541
            2       20.002      1.0316        2.8348       3.1128
            3       29.985      0.95265       3.233        3.2784
            4       40.022      0.9495        3.2186       3.2531
            5       49.979      0.83202       3.8244       3.2974
FittingError =
       2.184%
The fitting error is  not much better than the simple Gaussian fit. Better results can be had using preliminary position and width results obtained from the findpeaks function or by curve fitting with a simple Gaussian fit and using those results as the "start" vector:
>> [FitResults,FittingError]=peakfit([x;y],30,54,5, [1 5 5 5 5], [0  -5 -10 -15 -20], 10, [10 3.5 20 3.5 31 3.5 41 3.5 51 3.5], 0)
FitResults =
         Peak#   Position       Height       Width         Area
            1        9.9999     0.99995      3.0005       3.1936
            2       20          0.99998      3.001        3.1944
            3       30.001      1.0002       3.0006       3.1948
            4       40          0.99982      2.9996       3.1924
            5       49.999      1.0001       3.003        3.1243
FittingError =
       0.02%

Even more accurate results for area are obtained using peakfit with one Gaussian and four equal-width exponentially modified Gaussians (shape 8):
>> [FitResults,FittingError]=peakfit([x;y],30,54,5, [1 8 8 8 8], [0  -5 -10 -15 -20],10, [10 3.5 20 3.5 31 3.5 41 3.5 51 3.5],0)
FitResults =
         Peak#  Position   Height         Width           Area
            1    10        1.0001         2.9995        3.1929
            2    20        0.99998        3.0005        3.1939
            3    30        0.99987        3.0008        3.1939
            4    40        0.99987        2.9997        3.1926
            5    50        1.0006         2.9978        3.1207
FittingError =
       0.008%
The latter approach works
because, although the broadened peaks clearly have different widths (as shown in the simple Gaussian fit), the underlying pre-broadening peaks have all the same width. In general, if you expect that the peaks should have equal widths, or fixed widths, then it's better to use a constrained model that fits that knowledge; you'll get better estimates of the measured unknown properties, even though the fitting error will be higher than for an unconstrained model.

The disadvantages of the exponentially-broadened model are that (a) it may not be a perfect match to the actual physical broadening process; (b) it's slower that a simple  Gaussian fit, and (c) it sometimes need help, in the form of a start vector or equal-widths constraints, as seen above, in order to get the best results.

Alternatively, if the objective is only to measure the peak areas, and not the peak positions and widths, then it's not even necessary to model the physical peak-broadening of each peak. You can simply aim for a good fit using two (or more) closely-spaced simple Gaussians for each peak and simply add up the areas.of the best-fit model. For example, the 5th peak in the above example (the most asymmetrical), can be fit very well with two overlapping Gaussians, resulting in a total area of  1.9983+1.1948 = 3.1931, very close to the theoretical area of 3.1938.  Even more overlapping Gaussians can be used if the peak shape is more complex. When using this technique, it's best to set the number of trials (NumTrials, the 7th input argument of the peakfit.m function) to 10 or more; additionally, if the peak of interest is on a baseline, you must add up the areas of only those peak that contribute to fitting the peak itself and not those that are fitting the baseline. 

By making the peaks closer together, we can create a tougher and more realistic challenge.

y=modelpeaks2(x,[1 5 5 5 5],[1 1 1 1 1],[20 25 30 35 40],[3 3 3 3 3],[0 -5 -10 -15 -20]);

In this case, the triangle construction method gives areas of [3.1294 3.2020 3.3958 4.1563 4.4039], seriously overestimating the areas of the last two peaks, and measurepeaks.m using the perpendicular drop method gives areas of [3.233 3.2108 3.0884 3.0647 3.3602], compared to the theoretical value of 3.1938, better but not perfect. The integration/step height method is almost useless because the steps are no longer clearly distinct. The peakfit function does better, again using the approximate result of findpeaksG.m to supply a customized 'start' value.

>> [FitResults,FittingError]=peakfit([x;y],30,54,5,[1 8 8 8 8],[0  -5 -10 -15 -20],10, [20 3.5 25 3.5 31 3.5 36 3.5 41 3.5],0)
FitResults =
            1           20       0.99999      3.0002       3.1935
            2           25       0.99988      3.0014       3.1945
            3           30       1.0004       2.9971       3.1918
            4           35       0.9992       3.0043       3.1955
            5       40.001       1.0001       2.9981       3.1915
FittingError =
    0.01%

Next, we make a even
tougher challenge with different peak heights (1, 2, 3, 4 and 5, respectively) and a bit of added random noise. The theoretical areas (Height*Width*1.0645) are 3.1938, 6.3876, 9.5814, 12.775, and 15.969.
 
y=modelpeaks2(x,[1 5 5 5 5],[1 2 3 4 5], [20 25 30 35 40], [3 3 3 3 3], [0 -5 -10 -15 -20])+.01*randn(size(x));

>> [FitResults,FittingError]=peakfit([x;y],30,54,5, [1 8 8 8 8], [0  -5 -10 -15 -20] ,20, [20 3.5 25 3.5 31 3.5 36 3.5 41 3.5],0)
FitResults =
            1       19.999       1.0015       2.9978       3.1958
            2       25.001       1.9942       3.0165       6.4034
            3       30           3.0056       2.9851       9.5507
            4       34.997       3.9918       3.0076      12.78
            5       40.001       4.9965       3.0021      15.966
FittingError =
       0.2755

The measured areas in this case (last column) are very close to to the theoretical values, whereas all the other methods give substantially poorer accuracy. The more overlap between peaks, and the more unequal are the peak heights, the poorer the accuracy of the perpendicular drop and triangle construction methods. If the peaks are so overlapped that separate maxima are not visible, both methods fail completely, whereas curve fitting can often retrieve a reasonable result, but only if approximate first-guess values can be supplied.

index previous next
This page is part of "A Pragmatic Introduction to Signal Processing", created and maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu.   Updated August, 2017.
Unique visits since May 17, 2008: