A Matlab peak fitting program for time-series signals, which uses an
unconstrained non-linear optimization
algorithm to decompose a complex, overlapping-peak signal into
its component parts. The objective is to determine whether your
signal can be represented as the sum of fundamental underlying peaks
shapes. Accepts signals of any length, including those with
non-integer and non-uniform x-values. Fits groups of up to six peaks
with eight distinct peak shapes (expandable to other shapes). There
are two different versions,
(1) a command line version (peakfit.m)
for Matlab or Octave,
(2) a keypress operated
interactive version (ipf.m) for Matlab only. The difference between them is
that peakfit.m is completely controlled by command-line input
arguments and returns its information via command-line output
arguments; ipf.m allows interactive control via keypress commands.
For
automating the fitting of large numbers of signals, peakfit.m
is better; ipf.m is best for exploring signals to
determine the
optimum fitting range, peak shapes, number of peaks,
baseline correction mode, etc. Otherwise they have the same
curve-fitting capabilities. The peak shape models available
are illustrated in this graphic. See Notes and Hints for more
information and useful suggestions.
1. Command line function: peakfit.m
(Version 6.2, February, 2015)
Peakfit.mis a user-defined
command window peak fitting function for Matlab or Octave, usable
from a remote terminal. It is written as a self-contained
function in a single m-file. (To view of download, click Peakfit.m). It takes data in the
form of a 2 x n matrix that has the independent variables
(X-values) in row 1 and the dependent
variables (Y-values) in row 2, or as a
single dependent variable vector. The syntax is [FitResults, GOF,
baseline, coeff, BestStart, xi, yi,
BootResults]=peakfit(signal, center, window,
NumPeaks, peakshape, extra, NumTrials, start,
AUTOZERO, fixedwidth, plots,
bipolar,minwidth,DELTA,clipheight)). Only the first
argument, the data matrix, is required. The other
input and output arguments are explained below.
The screen display is shown on
the right; the upper panel shows the data as blue dots, the combined
model as a red line
(ideally overlapping the blue dots), and the model
components as green lines.
The dotted magenta lines
are the first-guess peak positions for the last fit. The
lower panel shows the residuals (difference between the
data and the model).
You can download a ZIP file
containing peakfit.m, DemoPeakFit.m,
ipf.m, Demoipf.m, some sample data for testing, and
a test script (testpeakfit.m)
that runs all the examples
sequentially to test for proper operation. For a discussion of
the accuracy and precision of peak parameter measurement using
peakfit.m, click here.
Version history.
Peakfit.m can be called with several
optional additional arguments. All input arguments (except
the signal itself) can be replaced by zeros to use their
default values.
peakfit(signal);
Performs an iterative least-squares fit of a single
unconstrained Gaussian peak to the entire data matrix "signal",
which has x values in row 1 and Y values in row 2 (e.g. [x y]) or
which may be a single signal vector (in which case the data points
are plotted against their index numbers on the x axis).
peakfit(signal,center,window);
Fits a single unconstrained Gaussian peak to a portion of
the matrix "signal". The portion is centered on the x-value
"center" and has width "window" (in x units).
In this and in all following
examples, set "center" and "window" both to 0to fit the entire
signal.
peakfit(signal,center,window,NumPeaks);
"NumPeaks" = number of peaks in the model (default is 1 if
not specified).
peakfit(signal,center,window,NumPeaks,peakshape);
Specifies the peak shape of the model: "peakshape" =
1-23. (1=unconstrained Gaussian, 2=unconstrained Lorentzian,
3=logistic distribution, 4=Pearson, 5=exponentially
broadened Gaussian; 6=equal-width Gaussians,7=equal-width
Lorentzians, 8=exponentially broadenedequal-width
Gaussians, 9=exponential pulse, 10= up-sigmoid (logistic function),11=fixed-width
Gaussians,12=fixed-width
Lorentzians, 13=Gaussian/Lorentzian
blend; 14=bifurcated Gaussian, 15=Breit-Wigner-Fano
resonance; 16=Fixed-position
Gaussians; 17=Fixed-position Lorentzians; 18=exponentially
broadened Lorentzian; 19=alpha function; 20=Voigt profile;
21=triangular; 22=multiple shapes; 23=down-sigmoid;
25=lognormal distribution; 26=slope (see
Example 28); 28=polynomial (Example 30); 29=articulated linear
segmented (see Example 29). In version 5 and later, the
peakshape can be a vector of different shapes for each peak,
e.g. [1 2 1] for three peaks in a Gaussian, Lorentzian, Gaussian
sequence. (The function ShapeDemo demonstrates
most of the basic peak shapes graphically,
showing the variable-shape peaks as multiple lines). Note:
"unconstrained" simply means that the position, height, and width
of each peak in the model is allowed to vary independently of the
other peaks, as opposed to the equal-width, fixed-width, or
fixed-position variants. peakfit(signal,center,window,NumPeaks,peakshape,extra)
Specifies the value of 'extra', used in the Pearson,
exponentially-broadened Gaussian, Gaussian/Lorentzian
blend, bifurcated Gaussian, and Breit-Wigner-Fano
shapes to fine-tune the peak shape.In version 5,
'extra' can be a vector of different extra values for each peak). peakfit(signal,center,window,NumPeaks,peakshape,
extra, NumTrials);
Performs "NumTrials" trial fits and selects the best one
(with lowest fitting error). NumTrials can be any positive integer
(default is 1).
peakfit(signal,center,window,NumPeaks,peakshape,extra,
NumTrials, start)
Specifies the first guesses vector "firstguess" for the
peak positions and widths, e.g. start=[position1 width1 position2
width2 ...]
peakfit(signal,center,window,NumPeaks,peakshape,extra,
NumTrials, start, autozero)
As above, but "autozero" sets the baseline correction mode in
the last argument: autozero=0
(default) does not subtract
baseline from data segment;. autozero=1 interpolates a linear
baseline from the edges of the data segment and subtracts it
from the signal (assumes that the peak
returns to the baseline at the edges of the signal);
autozero=2, like mode 1 except that it
computes a quadratic curved baseline; autozero=3
compensates for a flat baseline without reference to
the signal itself (does not require that the signal return to
the baseline at the edges of the signal, as does modes 1 and 2).
Coefficients of the polynomial baselines are returned in the
third output argument "baseline".
peakfit(signal,0,0,0,0,0,0,0,2)
Use zeros as placeholders to use the default values of
input arguments. In this case, autozerois set to 2, but all
others are the default values.
peakfit(signal,center,window,NumPeaks,peakshape,extra,NumTrials,
start, autozero, fixedparameters, plots, bipolar)
(12th input argument) 'bipolar' = 0 constrain peaks heights to
be positions; 'bipolar' = 1 allows positive ands negative peak
heights.
peakfit(signal,center,window,NumPeaks,peakshape,extra,NumTrials,
start, autozero, fixedparameters, plots, bipolar, minwidth)
'minwidth' (13th input argument) sets the minimum allowed
peak width. The default if not specified is equal to the x-axis
interval. Must be a vector of minimum widths, one value for
each peak, if the multiple peak shape is chosen.
peakfit(signal,center,window,NumPeaks,peakshape,extra,
NumTrials, start, autozero, fixedparameters, plots, bipolar,
minwidth, DELTA)
'DELTA' (14th input argument) controls the restart variance
when NumTrials>1. Default value is 1.0. Larger values give more
variance. Version 5.8 and later only.
[FitResults,FitError]=
peakfit(signal,center,window...);
Returns the FitResults vector in the order peak number, peak
position, peak height, peak width, and peak area), and the FitError
(the percent RMS difference between the data and the model in the
selected segment of that data) of the best fit.
To display column labels for the FitResults table, execute the
following statements after calling peakfit(...);: disp('
Peak number Position
Height
Width Peak
area');disp(FitResults)
[FitResults,
LowestError, baseline, BestStart, xi, yi, BootstrapErrors] =
peakfit([x;y],0,0,2,6,0,1,0,0,0);
Displays parameter error estimates; See DemoPeakfitBootstrap for a
self-contained demo of this function.
Optional output parameters:
FitResults: a table of model
peak parameters, one row for each peak, listing peak number,
peak position, height, width, and area (or, for shape 28, the
polynomial coefficients, and for shape 29, the x-axis
breakpoints).
GOF (Goodness of Fit), a 2-element vector
containing the RMS fitting error of the best trial fit and the
R-squared (coefficient of determination).
baseline: returns the polynomial coefficients
of the interpolated baseline in linear and quadratic
baseline modes (1 and 2) or the value of the constant baseline
in flat baseline mode.
coeff:
Coefficients for the polynomial fit (shape 28 only; for other
shapes, coeff=0)
BestStart: the starting guesses that gave the
best fit.
xi: vector containing 600 interpolated
x-values for the model peaks.
yi: matrix containing the y values of model
peaks at each xi. Type plot(xi,yi(1,:)) to plot peak 1 or
plot(xi,yi) to plot all the peaks
BootstrapErrors: a
matrix containing bootstrap standard deviations and
interquartile ranges for each peak parameter of each peak in the
fit.
Examples:
Note: test script testpeakfit.mruns all the following examples
automatically; just press Enter to continue to the next one.
Example 1. Fits
computed x vs y data with a single unconstrained Gaussian peak
model. >
x=[0:.1:10];y=exp(-(x-5).^2);peakfit([x' y']) ans =
Peak number Position
Height
Width Peak area
1
5
1
1.665
1.7725
Example 2. Fits small
set of manually-entered y data to a single unconstrained Gaussian
peak model.
> y=[0 1 2 4 6 7 6 4 2 1 0 ];x=1:length(y);
> peakfit([x;y],length(y)/2,length(y),0,0,0,0,0,0) Peak number Position
Height
Width Peak area
1
6.0001
6.9164
4.5213 32.98
Example 3.
Measurement of very noisy peak with signal-to-noise ratio = 1. (Try
several times). > x=[0:.01:10];y=exp(-(x-5).^2)+randn(size(x));peakfit([x;y])
Peak number Peak position
Height
Width Peak area
1
5.0951
1.0699
1.6668 1.8984
Example 4. Fits a noisy two-peak signal with a double unconstrained
Gaussian model (NumPeaks=2). >
x=[0:.1:10];y=exp(-(x-5).^2)+.5*exp(-(x-3).^2)+.1*randn(1,length(x)); > peakfit([x'
y'],5,19,2,1,0,1)
Peak number Position
Height
Width Peak area
1
3.0001
0.49489
1.642
0.86504
2
4.9927
1.0016
1.6597
1.7696
Example 5.Fits a
portion of the humps function, 0.7 units wide and centered on
x=0.3, with a single (NumPeaks=1) Pearson function (peakshape=4)
with extra=3 (controls shape of Pearson function). >
x=[0:.005:1];y=humps(x);peakfit([x' y'],.3,.7,1,4,3); Example 6.Creates
a data matrix 'smatrix', fits a portion to a two-peak unconstrained
Gaussian model, takes the best of 10 trials. Returns
optional output arguments FitResults and FitError. >
x=[0:.005:1];y=(humps(x)+humps(x-.13)).^3;smatrix=[x' y']; >
[FitResults,FitError]=peakfit(smatrix,.4,.7,2,1,0,10)
Peak number Position
Height
Width Peak area
FitResults =1
0.4128 3.1114e+008
0.10448 3.4605e+007
2
0.3161 2.8671e+008 0.098862
3.0174e+007 FitError
= 0.68048 Example 7.As
above, but specifies the first-guess position and width of the two
peaks, in the order [position1 width1 position2 width2] > peakfit([x'
y'],.4,.7,2,1,0,10,[.3 .1 .5 .1]);
Supplying a first guess position and width is also useful
if you have one peak on top of another (like example 4, with
both peaks at the same position x=5, but with different widths): >>
x=[2:.01:8];y=exp(-((x-5)/.2).^2)+.5.*exp(-(x-5).^2)+.1*randn(1,length(x));
>> peakfit([x' y'],0,0,2,1,0,1,[5 2 5 1])
Peak number Position
Height
Width Peak area
1
4.9977
0.51229
1.639 0.89377
2
4.9948
1.0017
0.32878 0.35059
Example 8.As
above,
returns the vector x1 containing 200 interpolated x-values for
the model peaks and the matrix y1 containing the y values of
each model peak at each xi. Type plot(xi,yi(1,:))to plot peak 1 orplot(xi,yi)to plot all peaks. >
[FitResults,LowestError,BestStart,xi,yi]=
peakfit(smatrix,.4,.7,2,1,0,10)
Example 9. Fitting a single
unconstrained Gaussian on a linear background, using the linear
autozero mode
(9th input argument = 1)
>>x=[0:.1:10]';y=10-x+exp(-(x-5).^2);peakfit([x
y],5,8,0,0,0,0,0,1)
Example 10.Fits
a group of three peaks near x=2400 inDataMatrix3with
three equal-width exponentially-broadened Gaussians.
Note: if your peaks are trailing off to the left,
rather that to the right as in the above example, simply use a negative
value for the time constant (in ipf.n, press Shift-X and
type a negative value).
Example
11. Example
of an unstable fit to a signal consisting of two unconstrained
Gaussian peaks of equal height (1.0). The peaks are too highly
overlapped for a stable fit, even though the fit error is
small and the residuals are unstructured. Each time you
re-generate this signal, it gives a different fit, with the
peaks heights varying about 15% from signal to signal. >> x=[0:.1:10]';
y=exp(-(x-5.5).^2)+exp(-(x-4.5).^2)+.01*randn(size(x));
[FitResults,FitError]=peakfit([x y],5,19,2,1)
Peak number
Position
Height
Width Peak area
1
4.4059
0.80119
1.6347
1.3941
2
5.3931
1.1606
1.7697
2.1864
FitError = 0.598 Much more stable results can be
obtained using the equal-width Gaussian model (peakfit([x y],5,19,2,6)),
but that is justified only if the experiment is
legitimately expected to yield peaks of equal width. Seehttp://terpconnect.umd.edu/~toh/spectrum/CurveFittingC.html#Peak_width_constraints.
Example 12.Demonstration
of the four autozero modes, for a single Gaussian on large
baseline, with position=10, height=1, and width=1.66. The
autozero mode is specified by the 9th input argument (0,1,2,
or 3).
Autozero=0 means to
ignore the baseline (default mode if not specified). In this
case, this leads to large errors.
>> x=8:.05:12;y=1+exp(-(x-10).^2); >> [FitResults,FitError,baseline]=peakfit([x;y],0,0,1,1,0,1,0,0)
FitResults =
1
10
1.8561
3.612 5.7641
FitError =5.387
baseline = 0
Autozero=1
subtracts linear baseline from edge to edge. Does not
work well in this case because the signal does not return
completely to the baseline
at the edges. >>
[FitResults,FitError,baseline]=peakfit([x;y],0,0,1,1,0,1,0,1)
FitResults =
1
9.9984
0.96161
1.5586
1.5914
FitError = 1.9801
baseline =
0.0012608
1.0376 Autozero=2 subtracts quadratic baseline from
edge to edge. Does not work well in this case because the
signal does not return
completely to the
baseline at the
edges. >>
[FitResults,FitError,baseline]=peakfit([x;y],0,0,1,1,0,1,0,2)
FitResults =
1
9.9996
0.81762
1.4379 1.2501
FitError = 1.8205
baseline =
-0.046619
0.9327 -3.469
Autozero=3 subtracts a flat baseline
automatically, without requiring that the signal
returns to baseline at the edges. This mode works best for
this signal. >>
[FitResults,FitError,baseline]=peakfit([x;y],0,0,1,1,0,1,0,3) FitResults =
1
10
1.0001
1.6653 1.7645 FitError = 0.0037056 baseline = 0.99985
In the following
case, the baseline is strongly sloped, but straight. In
that case the most accurate result is obtained by using
a 2-peak fit, fitting the baseline with a variable-slope
straight line (shape 26, peakfit version 6 only). >>
x=8:.05:12;y=x+exp(-(x-10).^2); >> [FitResults,FitError]=peakfit([x;y],0,0,2,[1
26],[1 1],1,0) FitResults =
1
10
1
1.6651 1.7642
2
4.485
0.22297
0.05 40.045 FitError =0.093
In the following case, the
baseline is curved, so the best choice is autozero=2:
>> x=[0:.1:10]';y=1./(1+x.^2)+exp(-(x-5).^2);
>> [FitResults,FitError,baseline]=peakfit([x
y],5,5.5,0,0,0,0,0,2)
FitResults =
1
5.0091
0.97108
1.603 1.6569
FitError = 0.97661
baseline = 0.0014928
-0.038196 0.22735
Example 13.Same
as example 4, but withfixed-widthGaussian (shape
11), width=1.666. >>
x=[0:.1:10];y=exp(-(x-5).^2)+.5*exp(-(x-3).^2)+.1*randn(size(x));
>> [FitResults,FitError]=peakfit([x' y'],0,0,2,11,0,0,0,0,1.666)
Peak number Position
Height
Width Peak area
1
3.9943
0.49537
1.666
0.87849
2
5.9924
0.98612
1.666
1.7488
Example 14.Peak
area measurements. Same as the example in Figure 15 onIntegration and Peak Area
Measurement. All four peaks have the same
theoretical peak area (1.772). The four peaks can be fit
together in one fitting operation using a 4-peak Gaussian
model, with only rough estimates of the first-guess positions
and widths. The peak areas thus measured are much more
accurate than the perpendicular drop method: >>
x=[0:.01:18];
>>
y=exp(-(x-4).^2)+exp(-(x-9).^2)+exp(-(x-12).^2)+exp(-(x-13.7).^2);
>> peakfit([x;y],0,0,4,1,0,1,[4 2 9 2 12 2 14
2],0,0) Peak number Position
Height
Width Peak area
1
4
1
1.6651
1.7725
2
9
1
1.6651
1.7725
3
12
1
1.6651
1.7724
4
13.7
1
1.6651
1.7725
This
works well even in the presence of substantial amounts of random
noise: >> x=[0:.01:18];
y=exp(-(x-4).^2)+exp(-(x-9).^2)+exp(-(x-12).^2)+exp(-(x-13.7).^2)+.1.*randn(size(x));
>> peakfit([x;y],0,0,4,1,0,1,[4 2 9 2 12 2 14 2],0,0) Peak number Position
Height
Width Peak area
1
4.0086
0.98555
1.6693
1.7513
2
9.0223
1.0007
1.669
1.7779
3
11.997
1.0035
1.6556
1.7685
4
13.701
1.0002
1.6505
1.7573
Sometimes experimental
peaks are effected by exponential broadening, which does not by
itself change the true peak areas, but does shift peak positions
and increases peak width, overlap, and asymmetry, as shown when
you try to fit the peaks with Gaussians. Using the same
noise signal from above:
>>y1=ExpBroaden(y',-50); >> peakfit([x;y1'],0,0,4,1,50,1,0,0,0)
ans =
1
4.4538
0.83851
1.9744 1.7623
2
9.4291
0.8511
1.9084 1.7289
3
12.089
0.59632
1.542 0.97883
4
13.787
1.0181
2.4016 2.6026
Peakfit.m
(and ipf.m) have an exponentially-broadened Gaussian peak shape
(shape #5) that works better in those cases:, recovering theoriginalpeak positions,
heights, widths, and areas. (Adding a first-guess vector as the
8th argument may improve the reliability of the fit in some
cases). >> y1=ExpBroaden(y',-50); >>
peakfit([x;y1'],0,0,4,5,50,1,[4 2 9 2 12 2 14
2],0,0)
ans=
Peak#
Position
Height
Width Area
1
4
1
1.6651
1.7725
2
9
1
1.6651
1.7725
3
12
1
1.6651
1.7725
4
13.7
0.99999
1.6651
1.7716
Example 15. Displays a table of
parameter error estimates. See DemoPeakfitBootstrap for a
self-contained demo of this function.
>> x=0:.05:9;
y=exp(-(x-5).^2)+.5*exp(-(x-3).^2)+.01*randn(1,length(x));
>> [FitResults,LowestError,baseline,BestStart,xi,yi,BootstrapErrors]=
peakfit([x;y],0,0,2,6,0,1,0,0,0);
Example
16.Fits both
peaks of the Humps function with a Gaussian/Lorentzian blend
(shape 13) that is 15% Gaussian (Extra=15). The 'Extra' argument sets the percentage
of Gaussian shape. >>
x=[0:.005:1];y=humps(x);[FitResults,FitError]= peakfit([x'
y'],0.54,0.93,2,13,15,10,0,0,0) FitResults =
1
0.30078
190.41
0.19131
23.064
2
0.89788
39.552
0.33448
6.1999 FitError = 0.34502 Example 17. Fit a
slightly asymmetrical peak with a bifurcated Gaussian (shape
14). The 'Extra' argument (=45) controls the peak asymmetry (50
is symmetrical).
>>
x=[0:.1:10];y=exp(-(x-4).^2)+.5*exp(-(x-5).^2)+.01*randn(size(x)); >>
[FitResults,FitError]=peakfit([x' y'],0,0,1,14,45,10,0,0,0)
FitResults =
1
4.2028
1.2315
4.077
2.6723 FitError =0.84461
Example 18. Example 1 without plotting or command
window printing (11th input argument = 0, default is 1)
>> x=[0:.1:10]';y=exp(-(x-5).^2);peakfit([x
y],0,0,1,1,0,0,0,0,0,0)
Example 19.Same
as example 4, but withfixed-positionGaussian (shape 16),
positions=[3 5]. >>
x=[0:.1:10];y=exp(-(x-5).^2)+.5*exp(-(x-3).^2)+.1*randn(size(x));
>> [FitResults,FitError]=peakfit([x' y'],0,0,2,16,0,0,0,0,[3
5])
FitResults =
1
3
0.49153
1.6492
0.86285
2
5
1.0114
1.6589
1.786
FitError =8.2693
Example 20.Exponentially
modified Lorentzian (shape 18) with added noise. As for peak shape
5, peakfit.m recovers the original peak position (9), height (1),
and width (1). >> x=[0:.01:20];
L=lorentzian(x,9,1)+0.02.*randn(size(x));L1=ExpBroaden(L',-100);
>> peakfit([x;L1'],0,0,1,18,100,1,0,0)
Example 21. “Humps” function fit
with two Voigt profiles (shape 20), alpha=1.7, flat baseline
mode (3).
[FitResults,FitError]=peakfit(humps(0:.01:2),
71, 140, 2, 20, 1.7, 1, ...[31 4.7 90 8.8],
3)
Example 22. peakfitdemob.m.
Illustrated on the right. Measurement of
three weak Gaussian peaks at x=100, 250, 400, superimposed in a very
strong Gaussian baseline plus noise. The peakfit function fits four
peaks, treating the baseline as a 4th peak whose peak position is
negative. This requires specifying a "start" vector, like Example 7.
You can test the reliability of this method by changing the peak
parameters in lines 11, 12, and 13 and see if the peakfit function
will successfully track the changes and give accurate results for
the three peaks without having to change the start vector. See Example 9 on iSignal.html for
other ways to handle this signal. (The true peaks heights are 1, 2,
and 3, respectively).
Example 23. 12^{th}
input argument (+/- mode) set to 1 (bipolar) to allow negative as
well as positive peak heights. (Default is 0) >>
x=[0:.1:10];y=exp(-(x-5).^2)-.5*exp(-(x-3).^2)+.1*randn(size(x)); >> peakfit([x' y'],0,0,2,1,0,1,0,0,0,1,1) FitResults =
1
3.1636
-0.5433
1.62 -0.9369
2
4.9487
0.96859
1.8456 1.9029 FitError =8.2757
Example 24.
Version 5 or later. Fits humps function to a model
consisting of one Lorentzian and one Gaussian peak. >>
x=[0:.005:1.2];y=humps(x);[FitResults,FitError]=peakfit([x'
y'],0,0,2,[2 1],[0 0]) FitResults =
1
0.30178
97.402
0.18862 25.116
2
0.89615
18.787
0.33676 6.6213 FitError = 1.0744
Example 25. Version
5 or later: Five peaks, five different shapes, all
heights = 1, all widths = 3, "extra" vector included for peaks 4
and 5. x=0:.1:60; y=modelpeaks2(x,[1 2 3
4 5],[1 1 1 1 1],[10 20 30 40 50],[3 3 3 3 3],[0 0 0 2
-20])+.01*randn(size(x)); peakfit([x' y'],0,0,5,[1 2 3 4 5],[0
0 0 2 -20])
You can also use this technique to create models with all the same
shapes but with different values of 'extra' using a vector
of 'extra' values, or (in version 5.7) with different minimum width
restrictions by using a vector of 'minwidth' values as input
argument 13.
Example 26. Version 5.7 or later: Minimum width
constraint (13th input argument) >> x=1:30;y=gaussian(x,15,8)+.05*randn(size(x));
No constraint (minwidth=0): peakfit([x;y],0,0,5,1,0,10,0,0,0,1,0,0);
Widths constrained to values 7 or above: peakfit([x;y],0,0,5,1,0,10,0,0,0,1,0,7);
Example 27. Noise test with very noisy peak signal: peak
height = RMS noise = 1.
>> x=[-10:.05:10];y=exp(-(x).^2)+randn(size(x));P=peakfit([x;y],0,0,1,1,0,10);
Example 28: Gaussian peak on strong sloped
straight-line baseline, 2-peak fit with one Gaussian and one
variable-slope straight line ('slope', shape 26, peakfit version 6
only). >> x=8:.05:12;y=x+exp(-(x-10).^2); >>
[FitResults,FitError]=peakfit([x;y],0,0,2,[1 26],[1 1],1,0) FitResults
=
1
10
1
1.6651 1.7642
2
4.485
0.22297
0.05 40.045 FitError =0.093 Example 29: Segmented linear fit (Shape 29, peakfit
version 6 only). You specify the number of segments in the 4^{th}
input argument ('NumPoints') and the program attempts to find the
optimum x-axis positions of the breakpoints that minimize
the fitting error. The vertical dashed magenta lines
mark the x-axis breakpoints. Another example with a single
Gaussian band . >> x=[0.9:.005:1.7];y=humps(x); >> peakfit([x' y'],0,0,9,29,0,10,0,0,0,1,1)
Example 30: Polynomial fit (Shape 27, peakfit version 6
only). Specify the order of the polynomial (any positive integer) in
the 6^{th} input argument ('extra'). (The 12th input
argument, 'bipolar', is set to 1 to plot the entire y-axis range
when y goes negative). The Matlab/Octave script NumPeaksTest.m
demonstrates this idea with noisy computer-generated signals
containing a user-selected 3, 4, 5 or 6 underlying peaks, which it
fits to a series of models containing 1 to 10 model peaks, plots the
fitting errors vs the number of model peaks and then determines the
vertex of the best-fit parabola; the nearest integer is
usually the correct number of peaks underlying peaks. >> x=[0.3:.005:1.7];y=humps(x);y=y+cumsum(y); >> peakfit([x' y'],0,0,1,28,6,10,0,0,0,1,1)
Example 31: The Matlab/Octave script NumPeaksTest.m demonstrates one way to
determine the minimum number of model peaks needed to fit a
set of data, plotting the fitting error vs the number of model
peaks, and looking for the point at which the fitting error
reaches a minimum. This script creates a
noisy computer-generated signal containing a user-selected 3, 4, 5
or 6 underlying peaks, fits to a series of models containing 1 to 10
model peaks, plots the fitting errors vs the number of model peaks
and then determines the vertex of the best-fit parabola; the
nearest integer is usually the correct number of peaks underlying
peaks. Also requires that the plotit.m
function be installed.
Note: to display the FitResults table with column
labels, call peakfit.m with output arguments [FitResults...] and
type : disp(' Peak number
Position
Height
Width Peak
area');disp(FitResults)
How do you find the right input arguments for peakfit?
If you have no idea where to start, you can use the Interactive Peak Fitter
(ipf.m) to quickly try different fitting regions, peak
shapes, numbers of peaks, baseline correction modes, etc. When you
get a good fit, you can press the "W" key to print out the
command line statement for peakfit.m that will perform that fit in
a single line of code, with or without graphics.
Demonstration script for peakfit.m
DemoPeakFit.m is a demonstration
script for peakfit.m. It generates an overlapping Gaussian peak
signal, adds normally-distributed noise, fits it with the peakfit.m function (in line 78), repeats
this many times ("NumRepeats" in line 20), then compares the peak
parameters (position, height, width, and area) of the
measurements to their actual values and computes accuracy
(percent error) and precision (percent relative standard
deviation). You can change any of the initial values in lines
13-30. Here
is a typical result for a two-peak signal with Gaussian peaks:
Percent errors of measured
parameters:
Position
Height
Width Area 0.048404
0.07906
0.12684 0.19799
0.023986 0.38235
-0.36158 -0.067655
Average Percent Parameter Error for all peaks:
0.036195
0.2307
0.24421 0.13282
In these results, you can see that the accuracy and precision (%RSD)
of the peak positionmeasurements are
always the best, followed by peak height, and then the peak width and peak
area, which
are usually the worst.
DemoPeakFitTime.m is a simple script
that demonstrates how to apply multiple curve fits to a signal
that is changing with time. The signal contains two noisy Gaussian
peaks (similar to the illustration at the right) in which the peak
position of the second peak increases with time and the
other parameters remain constant (except for the noise). The
script creates a set of 100 noisy signals (on line 5) containing
two Gaussian peaks where the position of the second peak changes
with time (from x=6 to 8) and the first peak remains the
same. Then it fits a 2-Gaussian model to each of those
signals (on line 8), stores the FitResults in a 100 × 2 × 5
matrix, displays the signals and the fits graphically with time (click to play animation), then
plots the measured peak position of the two peaks vs time on line
12.
Which to
use: iPeak, iSignal, or Peakfit?Read this comparison of all three.
Or download these Matlab demos that compare iPeak.m with
Peakfit.m for signals with a few peaks and
signals with many peaks and that shows how
to adjust iPeak to detect broad or narrow peaks.
DemoPeakfitBootstrap
demonstrates the ability of peakfit to compute estimates of the
errors in the measured peak parameters. These are self-contained
demos that include all required Matlab functions. Just place them
in your path and click Run
or type their name at the command prompt. Or you can
download all these demos together in idemos.zip.
peakfitVSfindpeaks.m
performs a direct comparison of the peak parameter accuracy of
findpeaks vs peakfit.
2. Automatically
finding and Fitting Peaks.
findpeaksfit.m
is essentially a combination of findpeaks.m
and peakfit.m. It
uses the number of peaks found and the peak positions and widths
determined by findpeaks as input for the peakfit.m function, which
then fits the entire signal with the specified peak model.
This combination function is more convenient that using findpeaks
and peakfit separately. It yields better values that findpeaks
alone, because peakfit fits the entire peak, not just the top
part, and because it deals with non-Gaussian and overlapped peaks.
However, it fits only those peaks that are found by findpeaks, so
you will have to make sure that every peak that contributes to
your signal is located by findpeaks. The syntax is
function [P,FitResults,LowestError,BestStart,xi,yi] =
findpeaksfit(x, y, SlopeThreshold, AmpThreshold,
smoothwidth, peakgroup, smoothtype, peakshape, extra,
NumTrials, autozero, fixedparameters, plots)
The first seven input arguments are exactly the same as for the findpeaks.m
function; if you have been using findpeaks or iPeak to find and
measure peaks in your signals, you can use those same input
argument values for findpeaksfit.m. The remaining six input
arguments of findpeaksfit.m are for the peakfit function; if you
have been using peakfit.m or ipf.m
to fit peaks in your signals, you can use those same input
argument values for findpeaksfit.m. Type "help findpeaksfit" for
more information. This function is included in the ipf11.zip distribution.
ipf.m
is a version of the peak
fitter for x,y data that uses keyboard commands and the mouse
cursor. It is written as a self-contained Matlab function,
in a single m-file. The flexible input syntax allows you to
specify the data as separate x,y vectors or as a 2xn matrix, and
to optionally define the initial focus by adding “center” and
“window” values as additional input arguments, where 'center' is
the desired x-value in the center of the upper window and
“window” is the desired width of that window. For
example:
1 input argument: ipf(y)
or ipf([x;y]) or ipf([x;y]');
2 input arguments:
ipf(x,y) or ipf([x;y],center) or ipf([x;y]',center);
3 input arguments:
ipf(x,y,center) or ipf(y,center,window) or
ipf([x;y],center,window) or ipf([x;y]',center,window);
4 input arguments:
ipf(x,y,center,window);
Like
iPeak
and iSignal, ipf starts out by showing the entire signal
in the lower panel and the selected region in the upper
panel (adjusted by the same cursor controls keys as iPeak
and iSignal). After performing a fit (figure on the
right), the upper panel shows the data as blue dots, the total model as a red line(ideally overlapping
the blue dots), and
the model components as green lines. The dotted magenta lines are the
first-guess peak positions for the last fit. The lower panel shows the residuals
(difference between the data and the model).Important note:
Make sure you don't click on the “Show Plot Tools” button in
the toolbar above the figure; that will
disable normal program functioning. If you do; close the
Figure window and start again.
Recent version history.
Version 11.1 adds minimum peak width constraint (Shift-W);
adds saturation maximum (Shift-M) to
ignore points above saturation maximum. Version 11 adds
polynomial fitting (Shift-o key). Version
10.7 corrects bugs in equal-width Lorentzians (shape 7) and
in the bipolar (+ and -) mode. Version 10.5,
August 2014 adds Shift-Ctrl-S and Shift-Ctrl-P
keys to transfer the current signal to iSignal and iPeak,
respectively, if those functions are installed in your Matlab
path; Version 10.4, June, 2014. Moves fitting result text to
bottom panel of graph. Log mode: (M key) toggles log mode
on/off, fits log(model) to log(y). Replaces bifurcated
Lorentzian with the Breit-Wigner-Fano resonance peak (Shift-B
key); see http://en.wikipedia.org/wiki/Fano_resonance.
Ctrl-A selects all; Shift-N negates signal.
Version 10 adds multiple-shape models; Version
9.9 adds '+=' key to flip between
+ (positive peaks only) and bipolar (+/- peaks) modes; Version
9.8 adds Shift-C to specify custom first guess
('start'). Version 9.7 adds Voigt profile shape. Version
9.6 added an additional
autozero mode that subtracts a flat baseline without
requiring that the signal return to the baseline at both
ends of the signal segment. Version 9.5 added
exponentially broadened Lorentzian (peak shape 18) and alpha
function (peak shape 19);Version 9.4: added bug fix for height
of Gaussian/ Lorentzian blend shape; Version 9.3 added Shift-S
to save Figure as a png graphic to the current directory.
Version 9.2: bug fixes; Version 9.1 added fixed-position
Gaussians (shape 16) and fixed-position Lorentzians (shape 17)
and a peak shape selection menu ( activated by the '_-'
key).
Demoipf.m is a demonstration script for
ipf.m, with a built-in simulated signal
generator. To download these m-files,
right-click on these links, select Save
Link As..., and click Save.
You can also download a ZIP file containing
ipf.m, Demoipf.m, and peakfit.m.
Example 1: Test with pure Gaussian function, default settings
of all input arguments.. >> x=[0:.1:10];y=exp(-(x-5).^2);ipf(x,y)
In this example, the fit is essentially perfect, no matter what are
the pan and zoom settings or the initial first-guess (start) values.
However, the peak area (last fit result reported) includes only the
area within the upper window, so it does vary. (But if there
were noise in the data or if the model were imperfect, then all
the fit results will depend on the exact the pan and zoom
settings).
Example 2: Test with
"center" and "window" specified. >>
x=[0:.005:1];y=humps(x).^3; >> ipf(x,y,0.335,0.39)
focuses on first peak >> ipf(x,y,0.91,0.18) focuses on
second peak Example 3: >> x=1:.1:1000;y=sin(x).^2;ipf(x,y,843.45,5) Isolates a narrow segment
toward the end of the signal.
KEYBOARD
CONTROLS (Version 11):
Pan signal left and right...Coarse: <
and >
Fine: left and right cursor arrow
Nudge: [ ]
Zoom in and out.............Coarse zoom: ?/
and "'
Fine zoom: up and down
arrow
keys
Select entire signal........Crtl-A (Zoom
all the way out)
Resets pan and zoom.........ESC
Select # of peaks...........Number keys 1-9,
or press 0 key to
enter
number manually
Select peak shape from menu - (minus or
hyphen), then type
number or shape vector and Enter
Select peak shape by key....gunconstrained
Gaussian h equal-width Gaussians Shift-G fixed-width Gaussians Shift-P fixed-position Gaussians Shift-H bifurcated Gaussians (a,z keys adjust shape) e Exponential-broadened Gaussian
(a,z
keys adjust broadening) j exponential-broadened
equal-width Gaussians
(a,z keys adjust broadening) lunconstrained
Lorentzian :; equal-width Lorentzians Shift [ fixed-position Lorentzians E (Shift-E) Exponential-broadened
Lorentzians
(a,z keys adjust broadening) L (Shift-L) Fixed-width
Lorentzians (a,z keys adjust
broadening) o LOgistic distribution (Use
Sigmoid for logistic function) p Pearson (a,z keys adjust shape) u exponential pUlse
y=exp(-tau1.*x).*(1-exp(-tau2.*x)) Shift-U Alpha: y=(x-tau2)./
tau1.*exp(1-(x-tau2)./tau1) s Up Sigmoid (logistic function):
y=.5+.5*erf((x-tau1)/sqrt(2*tau2))
Shift-D Down Sigmoid
y=.5-.5*erf((x-tau1)/sqrt(2*tau2)) ~` Gauss/Lorentz blend (a/z keys
adjust fraction of Gaussian) Shift-V Voigt profile (a/z adjusts
shape) Shift-B
Breit-Wigner-Fano (a/z adjusts Fano factor)
Fit.........................f
Select autozero mode........t selects
none, linear,
quadratic, or flat baseline mode
+ or +/- peak mode..........+= Flips
between + peaks only and
+/- peak mode
Invert (negate) signal......Shift-N
Toggle log y mode OFF/ON....m Log
mode plots and fits
log(model) to log(y).
2-point Baseline............b, then click
left and right
baseline
Set manual baseline.........Backspace, then
click baseline at
multiple points
Restore original baseline...|\ to
cancel previous background
subtraction
Cursor start positions......c, then click
on each peak
position
Type in start vector........Shift-C Type or
Paste start vector
[p1 w1 p2 w2 ...]
Print current start vector..Shift-Q
Enter value of 'Extra'......Shift-x, type
value (or vector of
values
in brackets), press Enter.
Adjust 'Extra' up/down......a,z: 5%
change; upper case A,Z:
0.5% change.
Print parameters & results..q
Print fit results only......r
Compute bootstrap stats.....v
Estimates standard deViations
of parameters.
Fit single bootstrap........n
Extracts and Fits siNgle
bootstrap sub-sample.
Plot signal in figure 2.....y
Print model data table......d
Refine fit..................x Takes best of
10 trial fits
(change number in line 219)
Print peakfit function......w Print
peakfit function with all
input arguments
Save Figure as png file.....Shift-S
Saves as Figure1.png,
Figure2.png, etc.
Display current settings....Shift-?
displays list of current
settings
Fit polynomial to segment...Shift-o
Asks for polynomial
order
Enter minimum width.........Shift-W Constrains
peak widths to
a specified minimum.
Enter saturation maximum....Shift-MIgnores points above a
specified saturation maximum.
Switch to iPeak.............Shift-Ctrl-P
transfers current
signal to iPeak.m
Switch to iSignal...........Shift-Ctrl-S
transfers current
signal to iSignal.m
(The
function ShapeDemo demonstrates
the basic peak shapes graphically,
showing the variable-shape peaks as multiple lines)
Practical examples with
experimental data:
1. In this example, a sample of room air is
analyzed by gas chromatography (data
source). The resulting chromatogram (right) shows two
slightly overlapping peaks, the first for oxygen and the second
for nitrogen. The area under each peak is presumed to be
proportional to the gas composition. The peaks are visibly
asymmetric, but an exponentially-broadened Gaussian (a
commonly-encountered peak shape in chromatography) gives a good
fit to the data, using ipf.m and adjusting the exponential term
with the A and Z keys to get the best fit. The results, shown in
the ipf.m screen on the right and in the Q-key report below,
show that the peak areas are in a ratio of 23% and 77%. This is
fairly close to the actual 21% and 78% composition, and the
results would have been even more accurate if argon were
included or if the areas were calibrated for the different
detector responses of nitrogen and oxygen.
Percent Fitting
Error = 2.9318% Elapsed time = 11.5251 sec. Peak# Position Height
Width Area 1 4.8385
17762 0.081094 1533.2 2 5.1439
47142 0.10205
5119.2
2. In this example, pan
and zoom controls are used to isolate a segment of a
chromatogram that contains three weak peaks near 5.8
minutes The lower plot shows the whole chromatogram and the upper plot shows
the segment. Only the peaks in that segment are
subjected to the fit. Pan and zoom are adjusted so that the signal
returns to the local baseline at the ends of the
segment.
Pressing T
selects autozero mode 1, causing the program to subtract a
linear baseline from these data points. Pressing 3, E, F performs a 3-peak
exponentially-broadened Gaussian fit (a common peak shape
in chromatography). The A
and Z keys are
then used to adjust the time constant ("Extra") to obtain
the best fit. The residuals (bottom panel) are random and
exhibit no obvious structure, indicating that the fit is
as good as is possible at this noise level.
3. The accuracy of peak position
measurement can be good even if the fitting error is rather
poor. In this example, an experimental high-resolution atomic
emission spectrum is examined in the region of the well-known
spectral lines of the element Sodium. Two lines are found
there (figure on the right), and when fit to a unconstrained
Lorentzian or Gaussian model, the peak wavelengths are determined
to be 588.98 nm and 589.57 nm.
Percent Fitting
Error 6.9922%
Peak# Position
Height
Width Area
1
588.98
234.34
0.16079 56.473
2
589.57
113.18
0.17509 29.63
Compare this to the ASTM recommended
wavelengths for this element (588.995 and 589.59 nm) and you
can see that the error is no greater than 0.02 nm, which is less
that the interval between the data points (0.05 nm). This
despite the fact that the fit is not particularly good, because the
peaks shapes are rather oddly shaped, perhaps by self-absorption.
This high degree of absolute accuracy compared to a reliable
exterior standard is a testament to the excellent wavelength
calibration of the instrument on which these experimental data were
obtained, but it also shows that peak position is by far the most
precisely measurable parameter in peak fitting, even when the curve
fit is not particularly good. The bootstrap standard deviation
estimates calculated by ipf.m for both wavelengths is 0.015 nm
(see #17 in the next section), so using the 2 x standard deviation
rule-of-thumb would have predicted a probable error within 0.03 nm.
(An even lower fitting error can be achieved by fitting to 4 peaks, but the position
accuracy of the larger peaks remains virtually unchanged).
4. In the first and third examples, the number of peaks in
the model was suggested by the data and by the expectations of each
of those experiments (two major gasses in air; sodium has a
well-known doublet at that wavelength). In the second example, no a
priori expectation of number of peaks was available, but the
data suggested three obvious peaks, and the residuals were random
and unstructured with a 3-peak model, suggesting that no additional
models peaks were needed. In many cases, however, the number
of model peaks is not so clearly indicated. For example, in this previously described example,
the fitting error keeps getting lower as more peaks are added to the
model, yet the residuals remain "wavy" and never become random.
Without further knowledge of the experiment, it's impossible to know
which are the "real peaks" and what is just "fitting the noise".
At the command line, typeipf(x,y),
(x = independent variable, y = dependent variable) or ipf(datamatrix)
where "datamatrix" is a matrix that hasx values in row or column 1 and y values
in row or column 2. Or
if you have only one signal vector y, type ipf(y).
You may optionally add to additional numerical
arguments: ipf(x,y,center,window); where
'center' is the desired x-value in the center of the upper
window and “window” is the desired width of that
window.
Use the four cursor arrow
keys on the keyboard to pan and zoom the signal to
isolate the peak or group of peaks that you want to fit in the
upper window. (Use the <
and > and ? and " keys for coarse pan
and zoom and the square bracket keys [ and ]
to nudge one point left and right). The curve fitting
operation applies only to the segment of the signal shown in
the top plot. The bottom plot shows the entire
signal. Try not to get any undesired peaks in the upper window
or the program may try to fit them. To select the entire signal,
press Ctrl-A.
Press the number keys (1–
9) to choose the number
of model peaks, that is, the minimum number of peaks that you
think will suffice to fit this segment of the signal. For more
than 9 peaks, press 0,
type the number, and press Enter.
Select the desired model peak shape
by pressing the '-' key and selecting the desired shape by
number from the list that is displayed. To select a
multi-shape model, type in a vector of shape numbers,
including the square brackets (e.g. [1 1 2]). Alternatively,
for single shape models, you can select peak shapes directly
in one keystroke by pressing the following keys: unconstrained
Gaussian (lower case g),
Equal-width Gaussians (lower case h), Fixed-width Gaussians (Shift-G), fixed-position
Gaussians (Shift-P),
exponentially-broadened Gaussian (e), exponentially-broadened equal-width
Gaussians (j);
bifurcated Gaussian (Shift-H),
unconstrained Lorentzian (lower case L);
Exponential-broadened Lorentzians (Shift-E);
Fixed-width Lorentzians (upper
case L), fixed-position Lorentzians (Shift [ ), Equal-width
Lorentzians (lower case ;);
Voigt (Shift-V); Shift-B Breit-Wigner-Fano;
Logistic distribution (lower case o), Pearson (lower case p), exponential pulse (u); Alpha function (Shift-U);
up sigmoid or logistic function (s); down
sigmoid (Shift-D), and Gaussian/Lorentzian blend (`) models.
If the peak widths of each group of peaks is expected
to be the same or nearly so, select the "equal-width" shapes.
If the peak widths or peak positions are known
from previous experiments, select the "fixed-width" or
"fixed position" shapes. These more
constrained fits are faster, easier, and much more
stable than regular all-variable fits, especially if the
number of model peaks is greater than 3 (because there are
fewer variable parameters for the program to adjust - rather
than a independent value for each peak).
A set of vertical dashed lines are shown on the plot, one for
each model peak. Try to fine-tune the Pan and Zoom
keys so that the signal goes to the baseline at both ends of the
upper plot and so that the peaks (or bumps) in the signal
roughly line up with the vertical dashed lines, as show in the
figures on the left. This does not have to be exact.
If you want to allow negative peaks as well as positive peaks,
press the + key to flip to the +/- mode (indicated by
the +/- sign in the y-axis label of the upper panel. Press it
again to return to the + mode (positive peaks only). You can
switch between these modes at any time. To negate the
entire signal, press Shift-N.
PressF to
initiate the curve-fitting calculation. Each time you press F, another fit of the
selected model to the data is performed with slightly different
starting positions, so that you can judge the stability of the
fit with respect to changes in starting first guesses. Keep your
eye on the residuals plot and on the "Error %" display. Do this several times, trying
for the lowest error and the most unstructured random residuals
plot. At any time, you can adjust the signal region to be fit
(step 2), the baseline position (step 9 and 10), change the
number or peaks (step 3), or peak shape (step 4). If the fit
seems unstable, try pressing the X key a few times (see
#13, below),
The model parameters of the last fit are displayed in the
upper window. For example, for a 3-peak fit: Peak# Position
Height Width Area 1
5.33329 14.8274 0.262253 4.13361 2
5.80253 26.825 0.326065 9.31117 3
6.27707 22.1461 0.249248 5.87425
The column are, left to right: the peak number, peak position,
peak height, peak width, and the peak area. (Note: for
exponential pulse (U)
and sigmoid (S) shapes,
Position and Width are replaced by Tau1 and Tau2). Press R to print this table out in
the command window. Peaks are numbered from left to right. (The
area of each component peak within the upper window is computed
using the trapezoidal method and displayed after the width).
Pressing Q prints out a
report of settings and results in the command window, like so:
Peak Shape = Gaussian
Number of peaks = 3
Fitted range = 5 - 6.64
Percent Error = 7.4514 Elapsed time = 0.19741
Sec.
Peak# Position
Height
Width Area
1
5.33329
14.8274
0.262253 4.13361
2
5.80253
26.825
0.326065 9.31117
3
6.27707
22.1461
0.249248 5.87425
To select the baseline correction mode, press the T key repeatedly; it cycles
thorough four modes: No baseline correction, linear
baseline subtraction, quadratic baseline
subtraction, and flat baseline correction. When
baseline subtraction is linear, a straight-line baseline
connecting the two ends of the signal segment in the upper panel
will be automatically subtracted. When baseline subtraction
is quadratic, a parabolic baseline connecting the two ends of
the signal segment in the upper panel will be automatically
subtracted. Use the quadratic baseline correction if the
baseline is curved, as in these examples:
Autozero OFF
Linear autozero
Quadratic autozero
If you prefer to set the baseline manually, press theB key, then click
on the baseline to the LEFT the peak(s), then click on the
baseline to the RIGHT the peak(s). The new baseline will be
subtracted and the fit re-calculated. (The new baseline remains
in effect until you use the pan or zoom controls).
Alternatively, you may use the multipoint background correction
for the entire signal: press the Backspace key, type in the desired number of
background points and press the Enter key, then click on the baseline starting
at the left of the lowest x-value and ending to the right of the
highest x-value. Press the \
key to restore the previous background to start over.
In some cases it will help to manually specify the
first-guess peak positions: pressC, then click on your estimates of the
peak positions in the upper graph, once for each peak. A
fit is automatically performed after the last click. Peaks are
numbered in the order clicked. Alternatively, you can type Shift-C
and then type in or Paste in the start vector, complete with
square brackets, e.g. “[pos1 wid1 pos2 wid2 ...]” where "pos1"
is the position of peak 1, "wid1" is the width of
peak 1, and so on for each peak. The custom start values remain
in effect until you change the number of peaks or use the pan or
zoom controls. Hint: if you Copy the start vector and keep it in
the Paste buffer, you can use the Shift-C key and Paste
it back in after changing the pan or zoom controls. Note:
It's possible to click beyond the x-axis range, to try
to fit a peak whose maximum is outside the x-axis range
displayed. This is useful when you want to fit a curved baseline
by treating it as an additional peak whose peak position if
off-scale (example).
The A and Z keys control the "shape"
parameter ('extra') that is used only if you are using the Voigt
profile, Pearson, exponentially-broadened Gaussian
(ExpGaussian), exponentially-broadened Lorentzian
(ExpLorentzian), bifurcated Gaussian, Breit-Wigner-Fano,
or Gaussian/Lorentzian blend. For the Voigt profile, the "shape"
parameter controls alpha, the ratio of the Lorentzian
width to the Doppler width. For the Pearson shape, a value of
1.0 gives a Lorentzian shape, a value of 2.0 gives a shape
roughly half-way between a Lorentzian and a Gaussian, and a
larger values give a nearly Gaussian shape. For the
exponentially broadened Gaussian shapes, the "shape" parameter
controls the exponential "time constant" (expressed as the
number of points). For the Gaussian/Lorentzian blend and
the bifurcated Gaussian shape, the "shape" parameter
controls the peak asymmetry (a values of 50 gives a symmetrical
peak). For the Breit-Wigner-Fano, it controls the Fano
factor. You can enter an initial value of the
"shape" parameter by pressing Shift-X , typing in a value, and
pressing the Enter key.
For multi-shape models, enter a vector of "extra" values,
one for every peak, enclosed in square brackets. For
single-shape models, you can adjust this value using the A and Z keys (hold down the Shift key to fine tune).
Seek to minimize the Error % or set it to a
previously-determined value. Note: if fitting multiple
overlapping variable-shape peaks, it's easier to fit a single
peak first, to get a rough value for the "shape" parameter
parameter, then just fine-tune that parameter for the multipeak
fit if necessary.
For difficult fits, it may help to press X, which performs ten silent
trial fits with slightly different first guesses and takes the
one with the lowest fitting error. (You can change the number of
trials,"NumTrials", in or near line 224 - the default value is
10). The peak positions and widths resulting from this
best-of-10 fit then become the starting points for subsequent
fits, so the fitting error should gradually get smaller
and smaller is you press X again and again, until it
settles down to a minimum. If none of the 10 trials gives a
lower fitting error than the previous one, nothing is changed.
Those starting values remain in effect until you change the
number of peaks or use the pan or zoom controls. (Remember: equal-width
fits, fixed-width
fits, and fixed position shapes are both faster, easier,
and much more stable than regular variable fits, so use
equal-width fits whenever the peak widths are expected to be
equal or nearly so, or fixed-width (or fixed position) fits when
the peak widths or positions are known from previous
experiments).
Press Y to display the
entire signal full screen
without cursors, with the last fit
displayed in green. The residual is displayed in red, on the
same y-axis scale as the entire signal.
Press M
to switch back and forth between log and linear modes. In
log mode, the y-axis of the upper plot switches to
semilog-y, and log(model) is fit to log(y), which may be
useful if the peaks vary greatly in amplitude.
Press the D
key to print out a table of model data in the command
window: x, y1, y2, ..., where x is the column of x values of
the fitted region and the y's are the y-values of each
component, one for each peak in the model. You can then Copy
and Paste this table into a spreadsheet or data plotting
program of your choice.
Press W
to print out the peakfit.m
function with all input
arguments, including the last best-fit values of the first
guess vector. You can copy and
paste the peakfit.m function into your own code or into the
command window, then replace "datamatrix" with you own x-y
matrix variable.
Bothipf.m,
and peakfit.m are able to estimate the
expected variability of thepeak
position, height, width, and area from the signal, by using
thebootstrap sampling method. This
involves
extracting 100 bootstrap samples from the signal, fitting each
of those samples with the model, then computing the percent
relative standard deviation (RSD) and the interquartile range
(IQR) of the parameters of each peak. Basically this method
calculates weighted fits of a single data set, using a
different set of different weights for each sample. This
process is computationally intensive can take several minutes
to complete, especially if the number of peaks in the
model and/or the number of points in the signal are high.
To activate this process in
ipf.m, press the V key. It first asks you to type in the
number of "best-of-x" trial fits per bootstrap sample (the
default is 1, but you may need higher number here if the fits
are occasionally unstable; try 5 or 10 here if the initial
results give NaNs or wildly improbable numbers). (To activate
this process in peakfit.m, you must use version 3.1 or later and
include all six output arguments, e.g. [FitResults, LowestError,
BestStart, xi, yi, BootstrapErrors]=peakfit...). The
results are displayed as a table in the command window. For
example, for a three-peak fit (to the same three peaks used by
the Demoipf demonstration script described in the next section)
and using 10 as the number or trials:
Number of fit trials per
bootstrap sample (0 to cancel): 10 Computing bootstrap
sampling statistics....May take several minutes.
Peak
#1
Position Height
Width Area
Bootstrap Mean: 800.5387
2.969539 31.0374 98.10405
Bootstrap
STD: 0.20336
0.02848 0.5061
1.2732
Bootstrap
IQR: 0.21933
0.027387 0.5218
1.1555
Percent
RSD: 0.025402
0.95908 1.6309
1.2978
Percent
IQR: 0.027398
0.92226 1.6812
1.1778
Peak
#2
Position Height
Width Area
Bootstrap
Mean: 850.0491 1.961368
36.4809 75.98684
Bootstrap
STD: 6.4458
0.14622
3.0657 4.7596
Bootstrap
IQR: 0.52358
0.058845
1.4303 3.9161
Percent
RSD: 0.75828
7.4549
8.4035 6.2637
Percent
IQR: 0.061594
3.0002
3.9205 5.1537 etc ....
Elapsed time is 98.394381 seconds.
Min/Max Fitting Error of the 100 bootstrap samples:
3.0971/2.5747
Observe that the RSD of the peak position is best
(lowest), followed by height and width and area. This is a typical
pattern, seen before. Also, be aware that the reliability of
the computed variability depends on the assumption that the noise
in the signal is representative of the average noise in repeated
measurements. If the number of data points in the signal is small,
these estimates can be very approximate.
A likely pitfall with the bootstrap method when applied to
iterative fits is the possibility that one (or more) of the
bootstrap fits will go astray, that is, will result in peak
parameters that are wildly different from the norm, causing the
estimated variability of the parameters to be too high. For that
reason, in ipf 8.12, the interquartile range (IQR) as well as the
standard deviation (STD) is reported. The IQR is more robust to
outliers. For a normal distribution, the interquartile range is
equal to 1.34896 times the standard deviation. But if one or more
of the bootstrap sample fits fails, resulting in a distribution of
peak parameters with large outliers, the STD will be much greater
that the IQR. In that case, a more realistic estimate of
variability is IRQ/1.34896. It's best to try to increase the fit
stability by choosing a better model (e.g. using an equal-width or
fixed-width model, or a fixed-position shape, if
appropriate), adjusting the fitted range (pan and zoom keys), the
background subtraction (T
or B keys), or the start
positions (C key), and/or
selecting a higher number of fit trials per bootstrap (which will
increase the computation time). As a quick preliminary test of
bootstrap fit stability, pressing the N key will perform a single
fit to a single random bootstrap sample and plot the result;
do that several times to see whether the bootstrap fits are stable
enough to be worth computing a 100-sample bootstrap. Note: it's
normal for the stability of the bootstrap sample fits (N
key) to be poorer than the full-sample fits (F
key), because the latter includes only the variability
caused by changing the starting positions for one set of data and
noise, whereas the N and
V keys aim to
include the variability caused by the random noise in the
sample by fitting bootstrap sub-samples. Moreover, the best
estimates of the measured peak parameters are those obtained by
the normal fits of the full signal (F and X
keys), not the means
reported for the bootstrap samples (V and N
keys), because there are more independent data points in the full
fits and because the bootstrap means are influenced by the
outliers that occur more commonly in the bootstrap fits. The
bootstrap results are useful only for estimating the variability
of the peak parameters, not for estimating their mean values. The
N and V keys are also very useful way to determine if you
are using too many peaks in your model; superfluous peaks will
be very unstable when N is press
repeatedly and will have much higher standard deviation of
its peak height when the V key is used.
19. (version 11 and above) Shift-o fits a
simple polynomial (linear, quadratic, cubic, etc) to the segment
of the signal displayed in the upper panel and displays the
polynomial coefficients (in descending powers) and the R^{2}.
20. (version 11.1 and above) If some peaks are
saturated (clipped at a maximum height), you can make
the program ignore the saturated points by pressing Shift-M and entering the maximum Y values to keep.
Y values above this limit will simply be ignored; peaks below this
limit will be fit as usual.
21.
(version 11.1 and above) To constrain
the model to peaks above a certain width, press Shift-W
and enter the minimum peak with allowed.
Demoipf.m is a demonstration
script for ipf.m, with a built-in simulated
signal generator. The true values of the simulated peak
positions, heights, and widths are displayed in the Matlab command
window, for comparison to the FitResults obtained by peak fitting.
The default simulated signal contains six independent groups of
peaks that you can use for practice: a triplet near x = 150, a
singlet at 400, a doublet near 600, a triplet near 850, and two
broad single peaks at 1200 and 1700. Run this demo and see how close
to the actual true peak parameters you can get. The useful thing
about a simulation like this is that you can get a feel for the
accuracy of peak parameter measurements, that is, the difference
between the true and measured values of peak parameters. To download
this m-file, right-click on the links, select Save Link As..., and click Save. To run it it, place both ipf.m and Demoipf in the
Matlab path, then type Demoipf
at the Matlab command prompt.
An example of the use of this script is shown on the right.
Here we focus in on the 3 fused peaks located near x=850. The
true peak parameters (before the addition of the random noise) are:
Position
Height
Width Area
800
3
30
95.808
850
2
40
85.163
900
1
50
53.227
When these peaks are isolated in the upper window and fit with three
Gaussians, the results are
Position
Height
Width Area
800.04
3.0628
29.315 95.583
850.15
1.9881
41.014 86.804
901.3
0.9699
46.861 48.376
So you can see that the accuracy of the measurements are excellent
for peak position, good for peak height, and least good for peak
width and area. It's no surprise that the least accurate
measurements are for the smallest peak with the poorest
signal-to-noise ratio. Note: the predicted standard deviation of
these peak parameters can be determined by the bootstrap sampling
method, as described in the previous section. We would expect that
the measured values of the peak parameters (comparing the true to
the measured values) would be within about 2 standard deviations of
the true values listed above).
Demoipf2.m is identical, except
that the peaks are superimposed on a strong curved baseline, so
you can test the accuracy of the baseline correction methods (#
9 and 10, above).
Execution time
By execution time I mean the time it takes for one fit to be
performed, exclusive of plotting or printing the results. The
biggest factors that determine the execution time are (a) the speed
of the computer, (b) the number of peaks, and (c) the peak shape:
a) The execution time can vary over a factor of 4 or 5
or more between different computers, (e.g. a small laptop with 1.6
GHz, dual core Athlon CPU with 4 Gbytes RAM, vs a desktop with a
3.4 GHz i7 CPU with 16 Gbytes RAM). Run the Matlab "bench.m"
benchmark test to see how your computer stacks up.
b) The execution time increases with the square of the number of
peaks in the model. (See PeakfitTimeTest.m).
c) The execution time varies greatly (sometimes by a factor of 100
or more) with the peak shape, with the exponentially-broadened
Gaussian being the slowest and the fixed-width Gaussian being the
fastest. See PeakfitTimeTest2.m
and PeakfitTimeTest2a.m
The equal-width and fixed-width shape variations are always
faster.
d) The execution time increases directly with NumTrials in
peakfit.m. The "Best of 10 trials" function (X key in ipf.m) takes about 10
times longer than a single fit.
Other factors that are less important are the number of data points
in the fitted region (but only if the number of data points is very
large; see PeakfitTimeTest3.m) and
the starting values (good starting values can reduce execution time
slightly; PeakfitTimeTest2.m and PeakfitTimeTest2a.m have examples).
Some of these TimeTest scripts need DataMatrix2 and DataMatrix3.
Hints and Tips
It's best not to
smooth your data prior to curve fitting. Smoothing can distort
the signal shape and the noise distribution, making it harder to
evaluate the fit by visual inspection of the residuals plot.
Smoothing your data beforehand makes it impossible to
achieve the goal of obtaining a random unstructured residuals
plot and it increases the chance that you will "fit the noise"
rather that the actual signal.
The most important factor in non-linear iterative curve
fitting is selecting the underlying
model
peak function, for example, Gaussian, Equal-width
Guardians, Lorentzian, etc. (Some examples of this).
It's
worth spending some time finding and verifying a suitable
function for your data. In particular, if the peak widths of
each group of peaks is expected to be the same or nearly so,
select the "equal-width" shapes; equal-width fits (available for
the Gaussian and Lorentzian shapes) are faster, easier, and much
more stable than regular variable-width fits. But it's
important to understand that a good fit is not by itself
proof that the shape function you have chose it the
correct one; in come cases the wrong function can give a fit
that looks perfect. For example, this fit of a real data set
to a 5-peak Gaussian model has a low percent fitting error and
the residuals look random - usually an indicator of a good
fit. But in fact in this case the model is wrong; that
data came from an experimental domain where the underlying shape
is fundamentally non-Gaussian but in some cases can look very
like a Gaussian. It's important to get the model right for
the data and not depend solely on the goodness of fit.
You should always use the minimum number of peaks
that adequately fits your data. (Some examples of
this). Using too many peaks will result in an unstable fit
- the green lines in the upper plot, representing the individual
component peaks, will bounce around wildly for each repeated
fit, without significantly reducing the Error. A very useful way
to determine if you are using too many peaks in your model is to
use the N key (see #9, below) to perform a single fit to
a bootstrap sub-sample of points; superfluous peaks will be
very unstable when N is press repeatedly.
(You can get better statistics for this test, at the expense of
time, by using the V key to compute the standard
deviation of 100 bootstrap sub-samples).
If the peaks are superimposed on a background or baseline,
that has to be accounted for before fitting, otherwise
the peak parameters (especially height, width and area) will be
inaccurate. Either subtract the baseline from the entire signal
using the Backspace key (# 10 in Operating Instructions, above) or
use the T key to select one of the automatic baseline
correction modes (# 9 in Operating
Instructions, above).
This program uses an iterative non-linear search function ("modified
Simplex") to determine the peak positions and widths that
best match the data. This requires first guesses for the peak
positions and widths. (The peak heights don't require first guesses, because
they are linear parameters; the program determines them by
linear regression). The default first guesses for the peak
positions are made by the computer on the basis of the pan and
zoom settings and are indicated by the magenta vertical dashed
lines. The first guesses for the peak widths are computed from
the Zoom setting, so the best results will be obtained if you
zoom in so that the particular group of peaks is isolated and
spread out as suggested by the peak position markers (vertical
dashed lines).
If the peak components are very unevenly spaced, you might be
better off entering the first-guess peak positions yourself by
pressing the C key and
then clicking on the top graph where you think the peaks might
be. None of this has to be exact - they're just first guesses,
but if they are too far off it can throw the search algorithm
off. You can also type in the first guesses for position and
width manually by pressing Shift-C).
Each time you perform a repeat fit (e.g. pressing the F key), the program
adds small random deviations to the first guesses, in order to
determine whether an improved fit might be obtained with
slightly different first guesses. This is useful for determining
the robustness or stability of the fit with respect to
starting values. If the error and the values of the peak
parameters vary slightly in a tight region, this means that you
have a robust fit (for that number of peaks). If the error and
the values of the peak parameters bounces around wildly, it
means the fit is not robust (try changing the number of peaks,
peak shape, and the pan and zoom settings), or it may simply be
that the data are not good enough to be fit with that model. Try
pressing the X key, which takes the best of 10 fits and
also uses those best-fit values as the starting first
guesses for subsequent fits. So each time you press X, if
any of those fits yield a fitting error less that the previous
best, that one is taken as the start for the nest fit. As
a result, the fits tend to get better and better gradually as
the X key is pressed repeatedly. Often, even if the
first fit is terrible, subsequent X-key fits will
improve considerably.
The variability in the peak parameters from fit to fit using
the X or F keys is only an estimate of the
uncertainty caused by the curve fitting procedure (but not of the uncertainty
caused by the noise in the data, because this is only for one
sample of the data and noise; for that you need the N
key fits).
To examine the robustness or stability of the fit with
respect to random noise in the data, press the N
key. Each time you press N,
it will perform a fit on a different subset of data points
(called a "bootstrap sample"). If that gives reasonable-looking
fits, then you can go on to compute the peak error statistics by
pressing the V key. If on the other hand the N
key gives wildly different fits, with highly variable fitting
errors and peak parameters, then the fit is not stable, and you
might try the X key to take the best of 10 fits and
reset the starting guesses, then press N again. In
difficult cases, it may be necessary to increase the number of
trials when asked (but that will increase the time it takes to
complete), or if that does not help, use another model or a
better data set. (The N and V keys are a good way to evaluate a
multi-peak model for the possibility of superfluous peaks, see #
3 above).
If you don't find the peak shape you need in this program,
look at the next section , or write
me and I'll see what I can do.
How to add a new peak shape
It's easier than you think to add your own custom peak shape
to peakfit.m or ipf.m, assuming that you have a mathematical
expression for your shape. The easiest way is to modify an
existing peak shape that you do not plan to use, replacing it
with your new function. Pick a shape to sacrifice that has the same
number of variables are your new shape. For example, if your shape
has variable position and width, you could modify the Triangular
shape (number 21). If your shape has an 'extra' parameter, like the
Voigt, Pearson, BWF, or blended Gaussian/Lorentzian, use one of
those. If you need an exponentially modified shape, use the
exponentially modified Gaussian or Lorentzian. If you need equal
widths or fixed widths, etc, use one of those shapes. This is
important; you must have the same number of variables and
constraints.
There are just two required steps to the process:
Open peakfit.m or ipf.m in the Matlab editor and
re-write the old shape function (such as "triangle.m", located
near the end of the code - there is one of those functions for
each shape) by changing name of the function and the
mathematics of the assignment statement (e.g. g
= 1-(1./wid).*abs(x-pos);). You can using the same
variables ('x' for the independent variable, 'pos' for peak
position, 'wid' for peak width, etc). Scale your function to
have a peak height of 1.0 (e.g. after computing y as a
function of x, divide by max(y)).
Use the search function in Matlab to find all instances of
the name of the old function and replace it with the new name,
checking "Wrap around" but leaving "Match case" and "Whole
word" unchecked in the Search box. If you do it right,
for example, all instances of "triangular" and all instances
of "fittriangular' will be modified with your new name
replacing "triangular".
That's it! Your new shape will now be shape 21 (or whatever was the
shape number of the old shape you sacrificed) and, in ipf.m, will be
activated by the same keystroke used by the old shape (e.g. Shift-T
for the triangular, key number 84).
If you must, you can change the keystroke assignment in ipf.m; find
a key or Shift-Key that is not yet assigned, and therefore gives an
"UnassignedKey" error statement when you press it with ipf.m
running, and change the old key number 84 to that unassigned
one in the big "switch double(key)," case statement near
the beginning of the code. But it's simpler just to use the old
key.
Tom O'Haver
Professor Emeritus
Department of Chemistry and Biochemistry
The University of Maryland at College Park toh@umd.edu http://www.wam.umd.edu/~toh