A simple example is fitting the blackbody equation to the spectrum of an incandescent body for the purpose of estimating its color temperature. In this case there is only one nonlinear parameter, temperature. The script BlackbodyDataFit.m demonstrates the technique, placing the experimentally measured spectrum in the vectors "wavelength" and "radiance" and then calling fminsearch with the fitting function fitblackbody.m.
Another application is demonstrated by Matlab's built-in demo fitdemo.m and its fitting function fitfun.m, which model the sum of two exponential decays. (To see this, just type "fitdemo" in the Matlab command window).
Fitting peaks. Many instrumental methods of measurement produce signals in the form of peaks of various shapes; a common requirement is to measure the positions, heights, widths, and/or areas of those peaks, even when they are noisy or overlapped with one another. This cannot be done by linear least-squares methods, because such signals can not be modeled as polynomials with linear coefficients (the positions and widths of the peaks are not linear functions), so iterative curve fitting techniques are used instead, often using Gaussian, Lorentzian, or some other fundamental simple peak shapes as a model.
The Matlab/Octave demonstration script
Demofitgauss.m
demonstrates fitting a Gaussian function to a set of data, using the
fitting function fitgauss.m.
In this case there are two non-linear parameters: peak position and
peak width (the peak height is a linear parameter and is determined
by regression in a single step in line 9 of the fitting function fitgauss.m and is returned in the global variable "c"). This is easily extended to fitting two overlapping Gaussians in Demofitgauss2.m (shown on the left) using the same
fitting function (which easily adapts to any number of peaks, depending
on the length of the first-guess "start" vector). These functions
call the user-defined peak shape function gaussian.m.
Similar procedures can be defined for other peak shapes simply by
calling the corresponding peak shape function, such as lorentzian.m. (Note: in order for scripts like Demofitgauss.m or Demofitgauss2.m to work on your version of Matlab, all the functions that they call must be loaded into Matlab beforehand, in this case fitgauss.m and gaussian.m).
You can create your own fitting functions for any purpose; they are not limited to single algebraic expressions, but can be arbitrarily complex multi-step algorithms. For example, in the TFit method for quantitative absorption spectroscopy, a model of the instrumentally-broadened transmission spectrum is fit to the observed transmission data, using Matlab's fminsearch function, in order to extend the dynamic range and calibration linearity of absorption spectroscopy beyond the normal limits, using a fitting function that performs Fourier convolution of the transmission spectrum with the known slit function of the spectrometer.
The uncertainty of the peak parameters determined by iterative least-squares fitting is
readily estimated by the bootstrap sampling method. A simple demonstration of bootstrap
estimation of the variability of an iterative least-squares fit
to a single noisy Gaussian peak is given by the custom downloadable
Matlab/Octave function "BootstrapIterativeFit.m",
which creates a single x,y data set consisting of a single noisy
gaussian peak, extracts bootstrap samples from that data set, performs
an iterative fit to the peak on each of the bootstrap samples, and
plots the histograms of peak height, position, and width of the
bootstrap samples. The syntax is BootstrapIterativeFit(TrueHeight, TruePosition, TrueWidth, NumPoints, Noise, NumTrials) where TrueHeight is the true peak height of the Gaussian peak, TruePosition is the true x-axis value at the peak maximum, TrueWidth
is the true half-width (FWHM) of the peak, NumPoints is the number of
points taken for the least-squares fit, Noise is the standard deviation
of (normally-distributed) random noise, and NumTrials is the number of
bootstrap samples. An typical example for BootstrapIterativeFit(100,100,100,20,10,100); is displayed in the figure on the right.
>> BootstrapIterativeFit(100,100,100,20,10,100);
Peak Height Peak Position Peak Width
mean: 99.27028 100.4002 94.5059
STD: 2.8292 1.3264 2.9939
IQR: 4.0897 1.6822 4.0164
IQR/STD Ratio: 1.3518
A similar demonstration function for two overlapping Gaussian peaks is available in "BootstrapIterativeFit2.m". Type "help BootstrapIterativeFit2" for more information. In both these simulations, the standard deviation (STD) as well as the interquartile range (IQR) of each of the peak parameters are computed. This is done because the interquartile range is much less influenced by outliers. The distribution of peak parameters measured by iterative fitting is often non-normal, exhibiting a greater fraction of large deviations from the mean than is expected for a normal distribution. This is because the iterative procedure sometimes converges on an abnormal result, especially for multi-peak fits with a large number of variable parameters. (You may be able to see this in the histograms plotted by these simulations, especially for the weaker peak in BootstrapIterativeFit2). In those cases the standard deviaiton will be too high because of the outliers, and the IQR/STD ratio will be much less than the value of 1.34896 that is expected for a normal distribution. In that case a better estimate of the standard deviation of the central portion of the distribution (without the outliers) is IQR/1.34896.
It's instructive to compare the iterative least-squares method with classical least-squares curve fitting, discussed in the previous section,
which can also fit peaks in a signal. The difference is that in the
classical least squares method, the positions, widths, and shapes of
all the individual components are all known beforehand; the only
unknowns are the amplitudes (e.g. peak heights) of the components in
the mixture. In non-linear iterative curve fitting, on the other hand, the positions,
widths, and heights of the peaks are all unknown beforehand; the only thing that is known is the fundamental underlying shape of the peaks. It
turns out
that the non-linear iterative curve fitting is much more difficult to
do and more prone to error, but it's necessary if you need to track
shifts in peak position or width or to decompose a complex overlapping
peak signal into fundamental components knowing only their shape. The Matlab/Octave script “CLSvsINLS.m”
compares the classical least-squares (CLS) method with three different
variations of the iterative method (INLS) method for measuring the peak
heights of three Gaussian peaks in a noisy test signal, demonstrating
that the fewer the number of unknown parameters, the faster and more
accurate is the peak height calculation.
Another comparison of multiple measurement techniques is presented in Case Study C.
Note: you can right-click on any of the m-file links on this page and select Save Link As... to download them to your computer, then place them in the Matlab path for use within Matlab.
Peak fitting functions and scripts that uses an
unconstrained non-linear
optimization algorithm
to decompose a complex,
overlapping-peak time-series signal into its component peaks. The objective is to
determine whether your signal can be represented as the sum of
fundamental underlying peaks shapes. These functions accept signals of any
length,
including those with non-integer and non-uniform x-values, and can fit any number of peaks with Gaussian, equal-width Gaussian, fixed-width Gaussian, exponentially-broadened Gaussian, exponentially-broadened equal-width Gaussians, bifurcated Gaussian, Lorentzian, fixed-width Lorentzians, equal-width Lorentzians, bifurcated Lorentzian, Logistic, Pearson 7, exponential pulse, sigmoid, and Gaussian/Lorentzian blend shapes. There are two different versions, a command line version called peakfit.m, for Matlab or Octave, and a keypress operated interactive version called ipf.m, for Matlab only. [Model errors] [Number of peaks] [Peak width] [Background correction] [Random noise] [Iterative fitting errors] [Exponential broadening] [Effect of smoothing]
Iterative curve fitting is often
used to measure the position, height, and width of peaks in a signal,
especially when they overlap significantly. There
are four major sources of error in measuring these peak parameters by
iterative curve fitting:
a. Model errors.

Peak shape. If
you have the wrong model for your peaks, the results can not be
expected to be accurate; for instance, if your actual peaks are
Lorentzian in shape, but you fit them with a Gaussian model, or vice versa. For example, a single isolated
Gaussian peak at x=5, with a height of 1.000 fits a Gaussian model
virtually perfectly, using the Matlab
user-defined peakfit
function, as shown on the right. (The 5th input argument for the
peakfit function specifies the shape of peaks to be used in the fit;
"1" means Gaussian).
>> x=[0:.1:10];y=exp(-(x-5).^2);
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,1,1)
Peak#
Position Height
Width Area
FitResults =
1
5
1
1.6649 1.7724
MeanFitError =
0.001679
The
"FitResults" are, from left to right, peak number, peak position, peak
height, peak width, and peak area. The MeanFitError, or just "fitting
error", is the square root of the sum of the squares of the differences between the data and the best-fit model,
as a percentage of the maximum signal in the fitted region.
(Note the good agreement of the area measurement (1.7724) with the theoretical area under the curve of exp(-x2), which is the square root of p, or 1.7725...).
But this same peak, when fit with a Logistic model (peak shape number 3), gives a fitting error of 1.4% and height and width errors of 3% and 6%, respectively:
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,1,3)
Peak#
Position Height
Width Area
FitResults =
1
5.0002
0.96652
1.762 1.7419
MeanFitError =
1.4095
When
fit with a Lorentzian model (peak shape 2, shown on the right), this peak gives a 6% fitting error and
height and width errors of 8% and 20%, respectively.
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,1,2)
FitResults =
Peak#
Position Height
Width Area
1
5
1.0876
1.3139 2.0579
MeanFitError =
5.7893
So clearly the larger the fitting errors, the larger are the parameter errors, but the parameter errors are of course not equal to the fitting error (that would just be too easy). Also, clearly the peak width and area are the parameters most susceptible to errors. The peak positions,
as you can see here, are measured accurately even if the model is way
wrong, as long as the peak is symmetrical and not
highly overlapping with other peaks.
Number of peaks. Another
source of model error occurs if you have the wrong number of peaks in your
model, for
example if the signal actually has two peaks but you try to fit it
with only one peak. In the example below, a line of Matlab code generates a simulated signal with of two Gaussian peaks at
x=4 and x=6
with peaks heights of 1.000 and 0.5000 respectively and widths of
1.665, plus random noise with a standard deviation 5% of the
height of the largest peak (a signal-to-noise ratio of 20):
>>x=[0:.1:10];y=exp(-(x-6).^2)+.5*exp(-(x-4).^2)+.05*randn(size(x));
In a real experiment you would not
usually know the peak positions, heights, and widths; you would be
using curve fitting to measure those parameters. Let's assume that, on
the basis of previous experience or some preliminary trial fits, you
have established that the optimum peak shape model is Gaussian, but you
don't know for sure how many peaks are in this group. If
you start out by fitting this signal with a single-peak Gaussian model, you get:
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,1,1)
FitResults =
Peak#
Position Height
Width Area
1
5.5291
0.86396
2.9789 2.7392
MeanFitError =
10.467
The residual plot shows a "wavy"
structure that's visible in the random scatter of points due to the
random noise in the signal. This means that the fitting error is not
limited by the random noise; it is a clue that the model is not
quite right.
But a fit with two peaks yields much better results (The 4th input argument for the peakfit function specifies the number of peaks to be used in the fit).
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,2,1)
FitResults =
Peak#
Position Height
Width Area
1
4.0165
0.50484
1.6982 0.91267
2
5.9932
1.0018
1.6652 1.7759
MeanFitError =
4.4635
Now the residuals have a random scatter
of points, as would be expected if the signal is
accurately fit except for the random noise. Moreover, the fitting
error is much lower (less that half) of the error with only one
peak. In fact, the fitting error is just about what we would
expect in this case based on the 5% random noise in the signal
(estimating the relative standard deviation of the points in the
baseline visible at the edges of the signal). Because this is a
simulation in which we know beforehand the true values of the peak
parameters (peaks at
x=4 and x=6
with peaks heights of 1.0 and 0.50 respectively and widths of
1.665), we can actually calculate the
parameter errors (the difference between the real peak positions,
heights, and widths and the measured values). We see that they are
quite accurate (in this case within about 1% relative on the peak
height and 2% on the widths), which is
actually better than the 5% random noise in this signal because of
the
averaging effect of fitting to multiple data points in the
signal.
But if going from one peak to two peaks gave
us a better fit, why not go to three peaks? Changing the number of
peaks to three gives these results:
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,3,1)
FitResults =
Peak# Position
Height Width Area
1 4.0748
0.51617
1.7874 0.98212
2 6.7799
0.089595 2.0455 0.19507
3 5.9711
0.94455 1.53 1.5384
MeanFitError =
4.3878
The fitting algorithm has now tried to fit an additional low-amplitude peak (numbered peak 2 in this case) located at x=6.78. The fitting error is actually lower that for the 2-peak fit, but only slightly lower, and the residuals are no less visually random that with a 2-peak fit. So, knowing nothing else, a 3-peak fit might be rejected on that basis alone. In fact, there is a serious downside to fitting more peaks than are present in the signal: it increases the parameter measurement errors of the peaks that are actually present. Again, we can prove this because we know beforehand the true values of the peak parameters: clearly the peak positions, heights, and widths of the two real peaks than are actually in the signal (peaks 1 and 3) are significantly less accurate that of the 2-peak fit.
Moreover, if
we repeat that fit with the same signal but with a different sample
of random noise (simulating a repeat measurement of a stable
experimental signal in the presence or random noise), the additional
third peak in the 3-peak fit will
bounce around all over the place (because the third peak is actually
fitting the random noise, not an actual peak in the signal).
>> x=[0:.1:10];y=exp(-(x-6).^2)+.5*exp(-(x-4).^2)+.05*randn(size(x));
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,3,1)
FitResults =
Peak#
Position Height
Width Area
1
4.115
0.44767
1.8768 0.89442
2 5.3118
0.093402
2.6986 0.26832
3
6.0681
0.91085
1.5116 1.4657
MeanFitError =
4.4089
With this new set of data, two of the
peaks (numbers 1 and 3) have roughly the same position, height, and
width, but peak number 2 has changed substantially compared to the
previous run. Now we have an even more compelling reason to
reject the 3-peak model: the 3-peak solution is not stable. And
because this is a simulation in which we know the right answers, we can
verify that the accuracy of the peak heights is substantially poorer
(about 10% error) than expected with this level of random noise in the
signal (5%). If we were to run a 2-peak fit on the same new data, we
get much better measurements of the peak heights.
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,2,1)
FitResults =
Peak# Position Height
Width Area
1
4.1601
0.49981
1.9108 1.0167
2
6.0585
0.97557
1.548 1.6076
MeanFitError =
4.4113
If this is repeated several times, it turns out that the peak parameters of
the peaks at x=4 and x=6 are, on average, more accurately measured by
the 2-peak fit. In practice,
the best way to evaluate a proposed fitting model is to fit several
repeat measurements of the same signal (if that is practical
experimentally) and to compute the standard deviation of the peak
parameter values.
Peak width constraints. Finally, there is one more thing that we can do that might improve the peak parameter measurement accuracy, and it concerns the peak widths. In all the above simulations, the basic assumption that all the peak parameters were unknown and independent of one another. In some types of measurements, however, the peak widths of each group of adjacent peaks are all expected to be equal to each other, on the basis of first principles or previous experiments. This is a common situation in analytical chemistry, especially in atomic spectroscopy and in chromatography, where the peak widths are determined largely by instrumental factors. In the current simulation, the true peaks widths are in fact both equal to 1.665, but all the results above show that the measured peak widths are close but not quite equal, due to random noise in the signal.
But we can introduce an equal-width constraint into the fit by
using peak shape 6 (Equal width Gaussians) or peak shape 7 (Equal-width Lorentzians) in peakfit version 1.4 and later. Using peak shape 6 on the same set of data as the previous example:
>> [FitResults,MeanFitError]=peakfit([x' y'],5,10,2,6)
FitResults =
Peak# Position Height
Width Area
1 4.0293 0.52818
1.5666 0.8808
2
5.9965
1.0192 1.5666 1.6997
MeanFitError =
4.5588
This "equal width" fit forces all the peaks within one group to have
exactly the same width, but that width is determined by the program from the data. The result is a slightly higher fitting error (in this
case 4.5 rather than 4.4), but - perhaps surprisingly - the peak
parameter measurements are usually more accurate and more reproducible (Specifically,
the
relative standard deviations are on average lower for the
equal-width fit than for an unconstrained-width fit to the same data,
assuming of course that the true underlying peak widths are really
equal). This is an exception to the general expectation that lower
fitting errors result in lower peak parameter errors. It is an
illustration of the general rule that the more you know about the
nature of your signals, and the closer your chosen model adheres to
that knowledge, the better the results. In this case we knew that
the peak shape was Gaussian (although we could have verified that
choice by trying other candidate peaks shapes). We determined
that the number of peaks was two by inspecting the residuals and
fitting errors for 1, 2, and 3 peak models. And then we
introduced the constraint of equal peak widths within each group of
peaks (based on prior knowledge of the experiment rather than on
inspection
of residuals and fitting errors). Not every experiment yields peaks of
equal width, but when it does, it's better to make use of that
constraint.
Going one step beyond equal widths, in peakfit version 2.6 (and ipf 8.6) and later, you can also specify a fixed-width Gaussian or Lorentzian shapes (shape numbers 11 and 12), in which the width of the peaks are not only equal to each other but are known beforehand and are specified in input argument 10, rather than being determined from the data as in the equal-width fit above. Introducing this constraint onto the previous example, and supplying an almost-accurate width as the 10th input argument:
>> [FitResults,MeanFitError]=peakfit([x' y'],0,0,2,11,0,0,0,0,1.666)
FitResults =
Peak#
Position Height
Width Area
1
3.9943
0.49537
1.666 0.8785
2
5.9924
0.98612
1.666 1.7488
MeanFitError =
4.8128
Comparing to the previous equal-width fit, the fitting error is larger here (because there are fewer degrees of freedom to minimize the error), but the parameter errors, particularly the peaks heights, are more accurate because the width information provided in the input argument was more accurate (1.666) than the width determined by the equal-width fit (1.5666). Again, not every experiment yields peaks of known width, but when it does, it's better to make use of that constraint.
Note that if the peak positions are also known, and only the peak heights are unknown, you don't even need to use the iterative fitting method at all; you can use the much easier and faster multilinear regression technique (“classical least squares”).
b. Background correction.
The peaks that are measured in many scientific instruments are
sometimes superimposed on a non-specific background. Ordinarily
the experiment protocol is designed to minimize the background or to
compensate for the background, for example by subtracting a "blank"
signal
from the signal of an actual specimen. But even so
there is often a residual background that can not be eliminated
completely experimentally. The origin and shape of that background
depends on the specific measurement method, but often this background
is
a broad, tilted, or curved shape, and the peaks of interest are
comparatively
narrow features superimposed on that background. The presence of the
background has little effect on the peak positions, but it is
impossible to measure the peak heights, width, and areas accurately
unless something is done to account for the background.
There are various
sophisticated methods described in the literature for estimating and
subtracting the background in such cases. The simplest assumption is
that the background is locally linear,
that is, can be approximated as a straight line in the local region of
group of peaks being fit together. This is easily accomplished if
each group of peaks is sufficiently isolated from other groups that the
signal returns to the local background between the groups of peaks, and
this is the basis of the "autozero" mode in the ipf.m and iPeak.m functions. When the autozero mode of these functions is turned on (T key), a straight-line baseline connecting
the two ends of the signal segment in the upper panel will be
automatically subtracted as the pan and zoom controls are used to isolate the group of overlapping peaks to be fit.
Example of an
experimental chromatographic signal. From
left to right, (1) Raw data with peaks superimposed on baseline. One
group of peaks is selected using the the pan and zoom controls,
adjusted so that the signal returns to the local background at the
edges of the segment displayed in the upper window; (2) The linear
baseline is subtracted when the autozero mode is turned on in ipf.m;
(3)
Fit with a three-peak Gaussian model, activated by pressing 3, G, F (3 peaks, Gaussian, Fit).
Alternatively, it may be
better to subtract the background from the entire signal first, before
futher operations are performed. As before, the simplest
assumption is that the background is piecewise linear,
that is, can be approximated as a series of small straight line
segments. This is the basis of the multi-point background
subtraction mode in ipf.m, iPeak.m, and in iSignal.
The user enters the number of points that is thought to be sufficient
to define the baseline, then clicks where the baseline is thought to be
along the entire length of the signal in the lower whole-signal display
(e.g. on the valleys between the peaks). After
the last point is clicked, the program interpolates between the clicked
points and subtracts the piecewise linear background from the original signal.
From
left to right, (1) Raw data with peaks superimposed on baseline. (2)
Background subtracted from the entire signal using the multi-point
background subtraction function in iPeak.m (ipf.m and iSignal have the same function).
In some cases the background may be
caused by a broad peak whose maximum falls outside of the range of data
acquired. This is not an uncommon situation with peaks shapes
that have gradually sloping sides, such as the Lorentzian. There is no
explicit provision for this in the current version of my programs;
however, if the background peak has the same shape as the measured
peaks, it is possible to fit the off-screen peak in ipf.m
simply by increasing the number of peaks by one. It may help to the
cursor-defined start position (C key) to define the start position for
the background peak beyond the x-axis range of the plot, but still
inside the Figure window. (Don't use the equal-width shapes for
this, because it's likely that the background peak is broader than the
measured peaks).
c. Random noise in the signal.
Any experimental signal has a certain amount of random noise,
which means that the individual data points scatter randomly above and
below their mean values. The assumption is ordinarily made that
the scatter is equally above and below the true signal, so that the
long-term average approaches the true mean value; the noise "averages
to zero" as it is often said. The practical problem is that any
given recording of the signal contains only one finite sample of the
noise. If another recording of the signal is made, it will
contain another independent sample of the noise. These noise samples are
not infinitely long and therefore do not represent the true long-term
nature of the noise. This presents two problems: (1) an individual
sample of the noise will not "average to zero" and thus the
parameters of the best-fit model will not necessarily equal the true
values, and (2) the magnitude of the noise during one sample might not
be typical; the noise might have been randomly greater or smaller than
average during that time. This means that the mathematical "propagation
of error" methods, which seek to estimate the likely error in the model
parameters based on the noise in the signal, will be subject to error
(underestimating the error if the noise happens to be lower than average
and overestimating the errors if the noise happens to be larger than average).
A better way to estimate the parameter errors is to record multiple samples of the signal, fit each of those separately, compute the models parameters from each fit, and calculate the standard error of each parameter. But if that is not practical, it is possible to simulate such measurements by adding random noise to a model with known parameters, then fitting that simulated noisy signal to determine the parameters, then repeating the procedure over and over again with different sets of random noise. This is exactly what the script DemoPeakfit.m (which requires the peakfit.m function) does for simulated noisy peak signals such as those illustrated below. It's easy to demonstrate that, as expected, the average fitting error precision and the relative standard deviation of the parameters increases directly with the random noise level in the signal. But the precision and the accuracy of the measured parameters also depend on which parameter it is (peak positions are always measured more accurately than their heights, widths, and areas) and on the peak height and extent of peak overlap (the two left-most peaks in this example are not only weaker but also more overlapped that the right-most peak, and therefore exhibit poorer parameter measurements). In this example, the fitting error is 1.6% and the percent relative standard deviation of the parameters ranges from 0.05% for the peak position of the largest peak to 12% for the peak area of the smallest peak.
The errors in the values of peak parameters measured by curve fitting depend not only on the characteristics of the peaks in question and the signal-to-noise ratio, but also upon other peaks that are overlapping it. From left to right: (1) a single peak at x=100 with a peak height of 1.0 and width of 30 is fit with a Gaussian model, yielding a relative fit error of 4.9% and relative standard deviation of peak position, height, and width of 0.2%, 0.95%, and 1.5% , respectively. (2) The same peak, with the same noise level but with another peak overlapping it, reduces the relative fit error to 2.4% (because the addition if the second peak increases overall signal amplitude), but increases the relative standard deviation of peak position, height, and width to 0.84%, 5%, and 4% - a seemingly better fit, but with poorer precision for the first peak. (3) The addition of a third peak further reduces the fit error to 1.6% , but the relative standard deviation of peak position, height, and width of the first peak are still 0.8%, 5.8%, and 3.64%, about the same as with two peaks, because the third peak does not overlap the first one significantly.
If the average noise noise in the signal is not known or its probability distribution is uncertain, it is possible to use the bootstrap sampling method to estimate the uncertainty of the peak heights, positions, and widths, as described above. Version 8 of the keypress operated interactive version of ipf.m has added a function (activated by the 'v' key) that estimates the expected standard deviation of the peak parameters using this method.
One way to reduce the effect of noise is to take more data. If
the experiment makes it possible to reduce the x-axis interval between
points, or to take multiple readings at each x-axis values, then the
resulting increase in the number of data points in each peak should
help reduce the effect of noise. As a demonstration, using the script DemoPeakfit.m to
create a simulated overlapping peak signal like that shown above right,
it's possible to change the interval between x values and thus the
total number of data points in the signal. With a noise level of 1% and
75 points in the signal, the fitting error is 0.35 and the average
parameter error is 0.8%. With 300 points in the signal and the same
noise level, the fitting error is essentially the same, but the average
parameter error drops to 0.4%, suggesting that the accuracy of the
measured parameters varies inversely with the square root of the number
of data points in the peaks.
The
figure on the right illustrates the importance of sampling interval and
data density. You can download the data file "udx" in TXT format or in Matlab MAT
format. The signal consists of two Gaussian peaks, one located at x=50
and the second at x=150. Both peaks have a peak height of 1.0 and a
peak
half-width of 10, and normally-distributed random white noise with a
standard deviation of 0.1 has been added to the entire signal. The
x-axis sampling interval, however, is different for the two peaks; it's
0.1 for the first peaks and 1.0 for the second peak. This means
that
the first peak is characterized by ten times more points that the
second peak. When you fit these peaks separately to a Gaussian
model (e.g., using peakfit.m or ipf.m), you will find that all the
parameters of the first peak are measured more accurately than the
second, even though the fitting error is not much different:
First
peak: Second
peak:
Percent Fitting Error = 7.6434% Percent Fitting Error = 8.8827%
Peak# Position Height Width Peak# Position Height Width
1 49.95 1.0049 10.111
1
149.64 1.0313 9.9415
d. Iterative fitting errors.
Unlike multiple linear regression curve fitting, iterative
methods may not converge on the exact same model parameters each time the fit
is repeated with slightly different starting values (first
guesses). The Interactive Peak Fitter makes it easy to test this, because it uses slightly different starting values each time the signal is fit (by pressing the F key in ipf.m, for example). Even better, by pressing the X
key, the ipf.m function silently computes 10 fits with different starting
values and takes the one with the lowest fitting error. A
basic assumption of any curve fitting operation is that the fitting
error (the RMS difference between the model and the data) is minimized,
the parameter errors (the difference between the actual parameters and
the parameters of the best-fit model) will also be
minimized. This is generally a good assumption, as demonstrated by the
graph to the right, which shows typical percent parameters errors as a
function of fitting error for the left-most peak in one sample of the simulated signal generated by DemoPeakfit.m (shown in the previous section). The variability of the fitting error here is caused
by random small variations in the first guesses, rather than by random
noise in the signal. In many practical cases there is enough
random noise in the signals that the iterative fitting errors within one sample of the signal are small
compared to the random noise errors between samples.
Remember that the variability in measured peak parameters from fit to fit of a single sample of the signal is not a good estimate of the precision or accuracy of those parameters, for the simple reason that those results represent only one sample of the signal, noise, and background. The sample-to-sample variations are likely to be much greater than the within-sample variations due to the iterative curve fitting. (In this case, a "sample" is a single recording of signal). To estimate the contribution of random noise to the variability in measured peak parameters when only a single sample if the signal is available, the bootstrap method can be used.
e. A difficult case. As a dramatic example of the ideas in parts c and d, consider this simulated example signal, consisting of two Gaussian peaks of equal height = 1.00, overlapping closely enough so that their sum is a single symmetrical peak.
If there were no noise in the signal, the peakfit.m or ipf.m routines could easily extract the two equal Gaussian components to an accuracy of 1 part in 1000.
>> peakfit([x y],5,19,2,1)
Peak# Position Height Width AreaBut in the presence of even a little noise (for example, 1% RSD), the results are uneven; one peak is almost always significantly higher than the other:
Peak#
Position Height
Width Area
1
4.4117
0.83282 1.61 1.4273
2
5.4022 1.1486 1.734 2.12
The fit is stable with any one sample of noise (if peakfit was run again with slightly different starting values, or if the F key were pressed several times in ipf.m), so the probem is not iterative fitting errors casued by different starting values. The problem is the noise: although the signal is completely symmetrical, any particular sample of the noise is not perfectly symmetrical (e.g. the first half of the noise usually averages a slightly higher or lower than the second half, resulting in an asymmetrical fit result). The surprising thing is that the error in the peak heights are much larger (about 15% relative, on average) than the random noise in the data (1% in this example). So even though the fit looks good - the fitting error is low (less than 1%) and the residuals are random and unstructured - still the model parameters can still be very far off. If you were to make another measurement (i.e. generate another independent set of noise), the results would be different but still inaccurate (the first peak has an equal chance of being larger or smaller than the second). Unfortunately, the expected error is not accurately predicted by the bootstrap method, which seriously underestimates the standard deviation of the peak parameters with repeated measurements of independent signals (because a bootstrap sub-sample of asymmetrical noise is likely to remain asymmetrical). A Monte Carlo simulation would give a more reliable estimation of uncertainty in such cases.
Better results can be obtained in cases where the peak widths are expected to be equal, in which case you can use peak shape 6 (equal-width Gaussian) instead of peak shape 1: peakfit([x y],5,19,2,6). It also helps to provide decent first guesses (start) and to set the numbver ot trials (NumTrials) to a number above 1): peakfit([x,y],5,10,2,6,0,10,[4 2 5 2],0,0).
So,
to sum up, we can make the following observations about the accuracy of
model parameters: (1) the parameter errors depend on the peak shape and
number of peaks; (2) the parameter errors
are directly proportional to the noise in the data and to the fitting
error; (3) the errors are
typically least for peak position and worse for peak width and area;
(4) the errors depend on the data density (number of independent data
points in the width of each peak) and on the extent of peak overlap
(the parameters of isolated peaks are easier to measure than highly
overlapped peaks); (5) if only a single signal is available, the effect
of noise on the standard deviation of the peak parameters in many cases can be
predicted approximately by the bootstrap method, but if the overlap of the peaks is too great, the error of the parameter measurements can be much greater than predicted.
Appendix a: Fitting signals that are subject to exponential broadening.
DataMatrix2 is a computer-generated test signal consisting of 16 symmetrical Gaussian peaks with random whi
te
noise added. The peaks occur in groups of 1, 2, or 3 overlapping peaks,
but the peak maxima are located at exactly integer values of x from 300
to 3900 (on the 100's) and the peak widths are always exactly 60
units. The peak heights vary from 0.06 to 1.85. The
standard deviation of the noise is 0.01. You can use this signal
to test curve-fitting programs and to determuine the accuracy of their
measurements of peak parameters. Right-click and select "Save" to
download this signal, put it in the Matlab path, then type "load DataMatrix2" at the command prompt to load it into the Matlab workspace.
DataMatrix3 i
s a exponentially broadened
version of DataMatrix2, with a "time constant" of 33 points on the
x-axis. The result of the exponential broadening is that all the peaks in this signal are asymmetrical, their peak maxima are shifted to longer x values, and their peak heights are smaller and their peak widths are larger
than the corresponding peaks in DataMatrix2. Also, the random noise is
damped in this signal compared to the original and is no longer
"white", as a consequence of the broadening. This type of effect
is common in physical measurements and often arises from some physical
or electrical effect in the measurement system that is apart from the
fundamental peak characteristics. In such cases it is usualy
desirable to compensate for the effect of the broadening, either by deconvolution
or by curve fitting, in an attempt to measure what the peak parameters
would have been before the broadening (and also to measure the
broadening itself). This can be done for Gaussian peaks that are
exponentially broadened by using the "ExpGaussian" peak shape in
peakfit.m and ipf.m. Right-click and select "Save" to download this signal, put it in the Matlab path, then type "load DataMatrix3" to load it into the Matlab workspace. 
The example illustrated on the right focuses on the single isolated peak whose "true" peak position, height, width, and area in the original unbroadened signal, are 2800, 0.52, 60, and 33.2 respectively.
(The relative standard deviation of the noise is 0.01/0.52=2%.) In the
broadened signal, the peak is visibly asymmetrical, the peak maximum is
shifted to larger x values, and
it has a shorter height and larger width, as demonstrated by the
attempt
to fit a normal (symmetrical) Gaussian to the broadened peak. (The peak
area, on the other
hand, is not much effected by the broadening).
The
large "wavy" residual plot is a tip-off that the model is not quite
right. Moreover, the fitting error (1.2%) is larger than expected for a
peak with a half-width of 60 points and a 2% noise RSD (approximately
2%/sqrt(60)=0.25%).

Fitting to an exponentially-broadened Gaussian (pictured on the right) gives a much lower fitting
error ("Percent error") and a more nearly random residual plot. But the
interesting thing is that it also recovers the original peak
position, height, and width to an accuracy of a fraction of 1%.
In performing this fit, the time constant ("extra") was experimentally
determined from the broadened
signal by adjusting it with the A and Z keys to give the lowest fitting
error; that also
gives a reasonably good measurement of the broadening factor (32.6, vs the actual
value of 33).
Peak Shape = Exponentially-broadened Gaussian
Autozero ON
Number of peaks = 1
Extra = 32.6327
Fitted range = 2640 - 2979.5 (339.5) (2809.75)
Percent Error = 0.21696
Peak#
Position Height
Width Area
1
2800.1302 0.51829906
60.086295 33.152429
Comparing the two methods, the exponentially-broadened Gaussian fit recovers all the underlying peak parameters quite accurately:
| Position |
Height |
Width |
Area |
|
| Actual peak parameters |
2800 |
0.52 |
60 |
33.2155 |
| Gaussian fit to broadened signal |
2814.832 |
0.45100549 |
68.441262 |
32.859436 |
| ExpGaussian fit to broadened signal | 2800.1302 | 0.51829906 |
60.086295 |
33.152429 |
Other peaks in the same signal, under the broadening influence of the
same time constant, can be fit with similar settings, for example the
set of three overlapping peaks near x=2400. As before, the peak
positions are recovered almost
exactly and even the width measurements are reasonably accurate (1% or
better). (The
smaller fitting error evident here is just a reflection of the larger
peak heights in this particular group of peaks - the noise is the same everywhere
in this signal).

Peak Shape = Exponentially-broadened Gaussian
Autozero OFF
Number of peaks = 3
Extra = 31.9071
Fitted range = 2206 - 2646.5 (440.5) (2426.25)
Percent Error = 0.11659
Peak#
Position Height
Width Area
1 2300.2349 0.83255884
60.283214 53.422354
2 2400.1618 0.4882451 60.122977 31.24918
3
2500.3123 0.85404245
60.633532 55.124839
The residual plots in both of these examples still have some "wavy" character, rather than being completly random and "white". The exponential broadening smooths out any white noise in the original signal that is introduced before the exponential effect, acting as a low-pass filter in the time domain and resulting in a low-frequency dominated "pink" noise, which is what remains in the residuals after the broadened peaks have been fit as well as possible. On the other hand, white noise that is introduced after the exponential effect would continue to appear white and random on the residuals. In real experimental data, both types of noise may be present in varying amounts.

One
final caveat: peak asymmetry similar to exponential broadening could possibly be the result a pair of closely-spaced peaks of different
peak heights. In fact, a single exponential broadened Gaussian
peak can sometimes be fit with two [Effect of smoothing] symmetrical Gaussians to a fitting error at
least as low as a single exponential
broadened Gaussian fit. This makes it hard to distinguish between
these two models on the basis of fitting error alone. However, this can
usually be decided by inspecting the other peaks in the signal: in most
experiments, the same exponental broadening applies to every peak in
the signal, and the broadening is either constant or changes gradually
over the length of the signal. On the other hand, it is very unlikely
that every peak in the signal will be accompanied by a smaller side
peak that varies in this way. So, if a only one or a few of the peaks exhibit
asymmetry, and the others are symmetrical, it's most likely that
the asymmetry is due to closely-spaced peaks of different peak heights. If all peaks have the same or similar asymmetry,
it's more likely to be a broadening factor that applies to the entire
signal. The two figures here provide an example from real
experimental data. On the left, three asymmetrical peaks are each fit
with two symmetrical Gaussians (six peaks total). On the right,
those three peaks are fit with one exponentially broadened Gaussian
each (three peaks total). In this case, the three
asymmetrical peaks all have the same asymmetry and can be fit with the
same time constant ("extra"). Moreover, the fitting error is slightly lower for
the three-peak exponentially broadened fit. Both of these observations argue for the three-peak exponentially
broadened fit rather than the six-peak fit. (Note: for best
results, use peakfit 2.4 and/or ipf 8.5 or later, which correct a bug
in the exponentially broadened Gaussian model).
Appendix b: The Effect of Smoothing before least-squares analysis.
In
general, it is not advisable to smooth a signal before applying
least-squares fitting, because doing so might distort the signal, make it hard to evaluate the
residuals properly, and bias the results of bootstrap sampling estimations of
precision, causing it to underestimate the between-signal variations in peak parameters. SmoothOptimization.m is a Matlab/Octave script that compares the effect of smoothing on the measurements of peak height of a Gaussian peak with a half-width of 166 points, plus white noise with a signal-to-noise ratio (SNR) of 10, using three different methods: (a) simply taking the single point at the center of the peak as the peak height; (b) using the gaussfit method to fit the top half of the peak (see CurveFitting.html#Transforming),
and (c) fitting the entire signal with a Gaussian using the iterative
method. The results of 150 trials with independent white noise samples are shown on the left: a typical raw signal is
shown in the upper left. The other three plots show the effect of the
SNR of the measured peak height vs the smooth ratio (the ratio of
the smooth width to the half-width of the peak) for those three
measurement methods. The results show that the simple
single-point measurement is indeed much improved by smoothing, as is
expected; however, the optimum SNR (which improves by roughly the square
root of the peak width of 166 points) is achieved only when the smooth
ratio approaches 1.0, and that much smoothing distorts the peak shape
significantly, reducing the peak height by about 40%. The curve-fitting
methods are much less effected by smoothing and the iterative method
hardly at all. So the bottom line is that you should not smooth prior to curve-fitting, because it will distort the peak and will
not gain any significant SNR advantage. The only situation where
it might be advantageous so smooth is when the noise in the signal is
high-frequency weighted (i.e. "blue" noise) or if the signal is contaminated with high-amplitude narrow spike artifacts, in which case a median-based pre-filter is useful.