A Matlab peak fitting program for time-series signals, which uses an
unconstrained non-linear
optimization algorithm to decompose a complex,
overlapping-peak signal into its component parts. The objective is to
determine whether your signal can be represented as the sum of
fundamental underlying peaks shapes. Accepts signals of any length,
including those with non-integer and non-uniform x-values. Fits groups
of peaks of Gaussian, Lorentzian, Logistic, Pearson, and
exponentially-broadened Gaussian shapes (expandable to other shapes). There are three different versions, a command line version, a keypress operated interactive version, and a version with mouse-controlled sliders (which requires Matlab 6.5).
Command line function: peakfit.m
Peakfit.m
is a user-defined command window peak fitting function, usable from a
remote terminal. It is written as a self-contained Matlab function in
a single m-file. (To view of download, click Peakfit.m). It takes data in the form of a 2 x n matrix that has x values in row 1 and Y values in
row 2. Several optional additional arguments may be added:
PEAKFIT(signal); Performs
an iterative least-squares fit of a single Gaussian peak to the
data matrix "signal".
PEAKFIT(signal,center,window); Fits
a single Gaussian peak to a portion of the matrix "signal". The
portion is centered on the x-value "center" and has width
"window" (in x units).
PEAKFIT(signal,center,window,NumPeaks); "NumPeaks" = number of peaks in the model (default is 1 if not specified).
PEAKFIT(signal,center,window,NumPeaks,peakshape); Specifies
the peak shape of the model: "peakshape" = 1-5. (1=Gaussian
(default), 2=Lorentzian, 3=logistic, 4=Pearson, and 5=exponentially broadened Gaussian)
PEAKFIT(signal,center,window,NumPeaks,peakshape,extra) Specifies
the value of 'extra', used in the Pearson and the exponentially
broadened Gaussian shapes to fine-tune the peak shape.
PEAKFIT(signal,center,window,NumPeaks,peakshape,extra,NumTrials); Performs
"NumTrials" trial fits and selects the best one (with lowest
fitting error). NumTrials can be any positive integer (default is 1).
PEAKFIT(signal,center,window,NumPeaks,peakshape,extra,NumTrials,start) Specifies
the first guesses vector "firstguess" for the peak positions and
widths, e.g. start=[position1 width1 position2 width2 ...]
[FitResults,MeanFitError]=PEAKFIT(signal,center,window...) Returns
the FitResults vector in the order peak number, peak position,
peak height, peak width, and peak area), and the MeanFitError (the
percent RMS difference between the data and the model in the
selected segment of that data) of the best fit.
Example 1: >> x=[0:.1:10];y=exp(-(x-5).^2);peakfit([x' y'])
Fits exp(-x)^2 with a single Gaussian peak model. Shown on the
right. Note that the peak area reported, 1.7724, is equal to the
theoretical
expectation (the square root of Pi).
Example 2: >> x=[0:.005:1];y=humps(x);peakfit([x' y'],.3,.7,1,4,3);
Fits a portion of the humps function, 0.7 units wide and centered on
x=0.3, with a single (NumPeaks=1) Pearson function
(peakshape=4) with extra=3 (controls shape of Pearson function).
Example 3: >> x=[0:.005:1];y=(humps(x)+humps(x-.13)).^3;smatrix=[x' y']; >> [FitResults,MeanFitError]=peakfit(smatrix,.4,.7,2,1,0,10)
Creates a data matrix 'smatrix', fits a portion to a two-peak
Gaussian model, takes the best of 10 trials. Returns
FitResults and MeanFitError. FitResults =
1 0.4128
3.1114e+008 0.10448 3.4605e+007
2 0.3161
2.8671e+008 0.098862 3.0174e+007 MeanFitError = 0.68048
Example 4: >> peakfit([x' y'],.4,.7,2,1,0,10,[.3 .1 .5 .1]);
As above, but specifies the first-guess position and width of the two
peaks, in the order [position1 width1 position2 width2]
Interactive keypress-operated version: ipf.m
ipf.m
is a version of the Interactive Peak Fitter that uses keyboard commands and the mouse cursor. It is
written as a self-contained
Matlab function, in a single m-file. The
syntax is ipf(x,y), where x and y are the independent and dependent
variables of your data set, ORipf(datamatrix) where "datamatrix" is a matrix that hasx values in row 1 and y values in row 2. Demoipf.m is a
demonstration script for ipf.m, with a built-in
simulated signal generator. To download these m-files, right-click on these links,
select Save Link As...,
and click Save.
In this example, pan and zoom
controls are used to isolate a segment of the signal that contains
three weak, noisy peaks The lower plot shows the whole signal and the
upper plot shows the segment. Only the peaks in that segment are
subjected to the fit.
Pan
and zoom are adjusted so that the signal returns to the local
baseline at the ends of the segment, allowing the program to compute
and subtract a linear baseline between those points. Pressing 3, G, F performs a 3-peak Gaussian fit.
Keyboard
Controls:
Pan left/right: Coarse pan: < and
>
Fine pan: left/right arrows
Zoom in/out: Coarse zoom: ? and " Fine
zoom: up/down arrows
Select # of peaks: Number keys 1-5
Select peak shape: G Gaussian L Lorentzian O Logistic P Pearson Perform one Fit: F Set baseline: B Custom
starts: C
Adjust 'extra': A,Z
Print fit info: Q
Print results only: R Print out data: D Prints commands K Autozero off/on T Best of 10 fits X
Operating instructions for ipf.m
At the command line, type ipf(x, y), (x =
independent
variable, y = dependent variable) or ipf(datamatrix) where "datamatrix" is a matrix that hasx values in row 1 and y values in row 2.
Use the Pan ( < and >) and Zoom (? and ") keys to isolate the peak or group of peaks that you want to fit. (You
can fine-tune the Pan and Zoom by using the cursor arrow keys on the
keyboard). The curve fitting operation applies only to the segment of
the signal shown in the top plot. The bottom plot shows the entire
signal. Try not to get any undesired peaks in the upper
window or the program will try to fit them.
Press
the 1,2,3,4 or 5
keys to choose the number of model peaks, that is, the minimum number
of peaks that you think might suffice to fit this segment of the
signal.
Press
the G,L,O,P, or E keys to select the
desired model peak shape. In this version there are five choices:
Gaussian (G), Lorentzian (L),
Logistic (O), Pearson (P),
and Exponentially-broadened Gaussian (E).
A set of vertical dashed lines are shown on the
plot, one for each model peak. Try to fine-tune the Pan
and Zoom keys so that the signal goes to the baseline at both ends
of the upper plot and so that the peaks (or bumps) in the signal
roughly line up with the vertical dashed lines, as show in the figures
on the left. This does not
have to be exact.
PressF to
initiate the curve-fitting calculation. Each time press F, another fit of the selected model to the data is performed with
slightly different starting positions, so that you can judge the
stability of the fit.
Keep your eye on the residuals plot and on the "Error %" display. Do
this several times, trying for the lowest error and the most
unstructured random residuals plot. At any time, you can refine the
baseline position (step 8), change the number or peaks (step 3), or
peak shape (step 4). To have the function automatically perform
10 trials fits and take the one with the lowest error, press the X key.
The model parameters of the last fit are displayed
in the upper window. For example, for a 3-peak fit:
The
column are, left to right: the peak number, peak position, peak height,
peak width, and the peak area. (In version 2.2 and later, the area of each component peak within
the upper window is computed using the trapezoidal method and displayed
after the width). Peaks are numbered from left to right. Pressing Q
prints out a brief report of results in the command window, like so:
Peak Shape = Gaussian
Number of peaks = 3
Fitted range = 5 - 6.64
Percent Error = 7.4514
Peak#
Position
Height
Width
Area
1
5.33329
14.8274
0.262253 4.13361
2
5.80253
26.825
0.326065
9.31117
3
6.27707
22.1461
0.249248 5.87425
Pressing R
prints out only the peak table (and the error as the 6th column) - useful for keeping a record of successive trial fits.
To turn ON and OFF the auto-zero mode, press the T
key. When auto-zero is ON, a straight-line baseline connection
the two ends of the signal segment in the upper panel will be
automatically subtracted. If you prefer to set the baseline manually,
press theB key,
then click on the baseline to the LEFT the peak(s), then click on the baseline to the RIGHT the
peak(s). The new baseline will be subtracted and the fit
re-calculated. (The new baseline remains in effect until you use the pan or zoom controls).
If you want to manually specify the first-guess peak
positions, pressC,
then click on your estimates of the peak positions in the upper graph,
once for each peak. A fit is automatically performed after the last
click. Peaks are numbered in the order clicked. (The custom start
positions remain in effect until you change the number of peaks or use
the pan or zoom controls).
The A and Z keys
is used only if you are
using the
Pearson or the exponentially-broadened Gaussian (ExpGaussian) shapes.
It controls the extra shape factor of the Pearson (a value of 1.0 gives
a Lorentzian shape, a value of 2.0 gives a shape roughly half-way
between a Lorentzian and a Gaussian, and a larger values give a nearly
Gaussian shape) and it also controls the time constant of the
exponentially broadened Gaussian. You can either adjust this to
minimize the
Error % or set it to a previously-determined value.
Demoipf.m is a
demonstration script for ipf.m, with a built-in
simulated signal generator. The true values of the simulated peak
positions,
heights, and widths are displayed in the Matlab command window, for
comparison to the FitResults obtained by peak fitting. The default
simulated signal contains six independent groups of peaks
that you can use for practice: a triplet near x = 150, a singlet at
400,
a doublet near 600, a triplet near 850, and two broad single peaks at
1200 and 1700. Run this demo and see how close to the actual true
peak parameters you can get. The useful thing about a simulation like
this is that you can get a feel for the accuracy of peak parameter
measurements, that is, the difference between the true and measured
values of peak parameters.
Interactive version with mouse-controlled sliders: InteractivePeakFitter.m
The opening screen
view shows the entire signal below and the zoomed-in segment above. Pan and zoom the signal to select the desired
group of peaks to be fit, adjusting so that the signal
goes to the baseline at both ends
of the upper plot and so that the peaks in the signal
roughly line up with the vertical dashed lines.
After selecting a speak shape and number of of peaks, press the F
key to perform a fit. A table of fit results are shown in the
upper window, with the component peaks shown in green. The
baseline can be subtracted automatically or manually.
The
bottom graph shows the residuals and standard deviation of the error.
Pressing F again repeats the fit with slightly different start
positions. Adjusting the zoom pan and zoom settings, refining the
background subtraction, and using custom start positions may help reduce the fitting error.
This figure shows a peak fit with an exponentially-broadened Gaussian shape. The time constant of the exponential is controlled by the "Extra" slider. Other available peak shapes include Gaussian ('g'), Lorentzian ('l'),
Logistic ('o'), and Pearson ('p'). Other shapes can be added if you know the mathematical form.
Operating Instructions for InteractivePeakFitter.m
Load your data set into the vectors x and y (x =
independent
variable, y = dependent variable) and then execute
InteractivePeakFitter.m.
Use the Pan and Zoom
sliders on the left and
right to isolate the peak or group of peaks that you want to fit. You
can fine-tune the Pan and Zoom by using the cursor arrow keys on the
keyboard. The curve fitting operation applies only to the segment of
the signal shown in the top plot. (The bottom plot shows the entire
signal). Try not to get any undesired peaks in the upper
window or the program will try to fit them.
Use the # Peaks slider (or press
the 1,2,3,4 or 5
keys) to choose the number of model peaks, that is, the minimum number
of peaks that you think might suffice to fit this segment of the
signal.
Use the Shape slider or press
the G,L,O,P, or E keys to select the
desired model peak shape. In this version there are five choices:
Gaussian ('g'), Lorentzian ('l'),
Logistic ('o'), Pearson ('p'),
and Exponentionally-broadened gaussian ('e').
A set of vertical dashed lines are shown on the
plot, one for each model peak. Try to fine-tune the Pan
and Zoom using the sliders or the
cursor arrow keys so that the signal goes to the baseline at both ends
of the upper plot and so that the peaks (or bumps) in the signal
roughly line up with the vertical dashed lines, as show in the figures
on the left. This does not
have to be exact.
Click on the Re-fit slider or
press 'f' to
initiate the curve-fitting calculation. Each time you click on this
slider, another fit of the selected model to the data is performed.
Keep your eye on the residuals plot and on the "Error %" display. Do
this several times, trying for the lowest error and the most
unstructured random residuals plot. At any time, you can refine the
baseline position (step 8), change the number or peaks (step 3), or
peak shape (step 4).
The model parameters of the last fit are displayed
in the upper window. For example, for a 3-peak fit:
The
column are, left to right: the peak number, peak position, peak
height, peak width, and peak area. (In version 2.2 and later, the
area of each component peak within
the upper window is computed using the trapezoidal method and displayed
after the width). Peaks are numbered from left to right. To print this
out in the Matlab command window, type "FitResults" in
the Matlab command window and press Enter. FitResults(:,2) lists
the peak positions, FitResults(:,3)
lists the peak heights, FitResults(:,4)
lists the peak widths, and FitResults(:,5)
lists the peak areas falling within the upper window.
If you wish to set the baseline manually, press the
'B' key or click the BG
button,
then click on the baseline to the LEFT the peak(s), then click on the
baseline to the RIGHT the
peak(s). The new baseline will be subtracted and the fit
re-calculated. (The new baseline remains in effect until you use the
pan or zoom controls to change the fitting region; then you have to
re-do the baseline for that new region).
If you want to manually specify the first-guess peak
positions, click the Custom button or press 'c',
then click on your estimates of the peak positions in the upper graph,
once for each peak. A fit is automatically performed after the last
click. Peaks are numbered in the order clicked. (The custom start
positions remain in effect until you change the number of peaks or use
the pan or zoom controls).
The Extra slider (or the A and Z keys, in the
keyboard-operated version ipf.m)
is used only if you are
using the
Pearson or the exponentially-broadened Gaussian (ExpGaussian) shapes.
It controls the extra shape factor of the Pearson (a value of 1.0 gives
a Lorentzian shape, a value of 2.0 gives a shape roughly half-way
between a Lorentzian and a Gaussian, and a larger values give a nearly
Gaussian shape) and it also controls the time constant of the
exponentially broadened gaussian. You can either adjust this to
minimize the
Error % or set it to a previously-determined value. You can change this
slider's range to suit your proposes in
lines 47-48 of InteractivePeakFitter.m.
DemoInteractivePeakFitter.m
Demonstration script for the Interactive Peak Fitter with a built-in simulated signal generator.
Requires Matlab 6.5. The true values of the simulated peak positions,
heights, and widths are displayed in the Matlab command window, for
comparison to the FitResults obtained by peak fitting.
The default simulated signal contains six independent groups of peaks
that you can use for practice: a triplet at x = 150, a singlet at 400,
a doublet near 600, a triplet at 850, and two broad single peaks at
1200 and 1700. You can change the character of the simulated signal in
lines 23-28.
Notes for all versions
Note 1: When a fit is performed, the lower
half of the figure
window displays the residuals (differences between the
model and the data). The "Error" or "Percent Error" reported by this program is
the RMS difference between the best-fit model and the raw data over the
fitted segment.
Note 2: When the number of peaks is 2 or
greater, the green
lines displayed in the upper plot after a fit is performed are the individual model peaks; the red line is
the total model, i.e. the sum of the components, which is a
least-squares best-fit to the blue data points.
Note 3: In the slider version, if the range of the sliders is
inappropriate for your signal, you can adjust the slider ranges in
lines 45-52 of InteractivePeakFitter.m.
Note 4: The Extra
slider (or the A
and Z keys,
in the keypress-operated version ipf.m)
is used to manually control the global variable "extra",
which could be used for a variety of purposes. If you add your
own peak shape (see "How to add a new peak shape", below) , you can use
the extra variable for any purpose, is you need another adjustable
parameter.
Note 5: By program automatically
subtracts a linear
baseline from the segment of signal displayed in the upper window. The baseline is
calculated by interpolating from the average of the first 1/10th of the
points in the segment to the average of the
last 1/10th of the points. Try to adjust the pan and
zoom to include some of the baseline at the beginning and end of the segment in
the upper window, allowing the automatic baseline subtract gets a good
reading of the baseline If that's not possible, you can still
refine the baseline subtraction manually if you wish, by using the the
BG button or press 'B' as described in step 8,
above. To disable the auto-zero operation, press T.
Note 6:
The peak areas are calculated only for the portion of the component
peaks falling within the upper window. If some of the component
peaks are close enough to the edges of the window that they tail off
outside the window, their peak areas will be inaccurate.
Note 7:
The "Error" or "Percent Error" reported by this program is simply
the RMS difference between the best-fit model and the raw data over the
fitted segment. It is not
the error in the parameters derived
from the fit (peak position, height, and width). The accuracy of
those parameters (the difference between the true and measured values
of peak parameters) is much harder to estimate, because of course
you don't know the true values of the peak parameters. However, the precision
of measurement can be estimated if you can obtain several repeat
samples of the same
signal (with independent random noise), fit each sample using the
same procedure, and calculate the standard deviation of the parameters
from all those fits. If the random errors arising from random
noise in the signal or random variations is the accuracy of background
subtraction are the dominant sources of error, then an estimate of the
precision is a good indication of the accuracy of parameter
measurement. More on accuracy and precision of peak parameter measurement.
Hints and Tips
It's best not to
smooth your data prior to curve fitting. Smoothing can distort
the signal shape and the noise distribution, making it harder to
evaluate the fit by visual inspection of the residuals plot.
Smoothing your data beforehand makes it impossible to achieve the
goal of obtaining a random unstructured residuals plot and it increases
the chance that you will "fit the noise" rather that the actual signal.
The
most important factor in non-linear iterative curve
fitting is selecting the underlying model peak function, for example, Gaussian,
Lorentzian, etc. (Some examples of this). It's worth spending some time finding and verifying a
suitable function for your data. (See the instructions below for adding
a new peak shape to the program).
You should always use the minimum number of peaks that
adequately fits your data. (Some examples of this). Using too many peaks will result in an
unstable fit - the green lines in the upper plot, representing the
individual component peaks, will bounce around wildly as the Re-fit
slider is moved, without significantly reducing the Error %.
This program uses an iterative non-linear search function
(modified Simplex) to determine the peak positions and widths that best
match the data. This requires first guesses for the peak positions and
widths. (The peak heights
don't
require first guesses, because they are linear parameters; the program
determines them by linear regression). The default first guesses for
the peak
positions
are made by the computer on the basis of the pan and zoom slider
settings and are indicated by the magenta vertical dashed lines. The
first guesses for the peak widths are computed from the Zoom setting,
so the best results will be obtained if you zoom in so that the
particular group of peaks is isolated and spread out as suggested by
the peak position markers (vertical dashed lines).
If the peak components are very unevenly spaced, you might
be better off entering the first-guess peak positions yourself by
pressing the C key (clicking on the Custom slider) and then clicking on
the top
graph where you think the peaks might be. None of this has to be exact
- they're just first guesses, but if they are too far off it can throw
the search algorithm off.
Each time you perform a repeat fit (e.g. pressing the F
key), the program adds small random deviations to the
first guesses, in order to determine whether an improved fit might be
obtained with slightly different first guesses. This is useful for
determining the robustness or stability of the fit. If the error and
the values of the peak parameters vary slightly in a tight region, this
means that you have a robust fit (for that number of peaks).
If the error and the values of the peak parameters bounces around
wildly, it means the fit is not robust (try changing the number of
peaks, peak shape, and the pan and zoom settings), or it may simply be
that the data are not good enough to be fit with that model. The
variability in the peak parameters from fit to fit is an estimate of
the uncertainty caused by the curve fitting procedure (but not of
the uncertainty caused by the noise in the data, because this is only
for one sample of the data and noise). Pressing F and R alternately will record the results of each successive fit in the command window.
How to add a new peak shape
To add a new peak shape to the five initial shapes, you
must have
a function for that peak shape, similar to gaussian.m, that takes as
arguments a vector of x-values, a peak-center value, and a peak width,
and returns a vector of that function evaluated at each value of x.
Let's call it NewPeakShape. Add NewPeakShape to the PeakFitter folder (or if you are using the keypress version ipf.m,
there is already a "NewPeakShape" function at the end of the code;
just replace "g=sin(a.*x+b)" with your own function of x,a, and b, and
return the value in g).
Create a fitting function for this peak shape, similar to
fitgaussian.m.
function err = fitNewPeakShape(lambda,t,y) % Fitting function for NewPeakShape. global c A = zeros(length(t),round(length(lambda)/2)); for j = 1:length(lambda)/2, A(:,j) = NewPeakShape(t,lambda(2*j-1),lambda(2*j))'; end c = A\y'; z = A*c; err = norm(z-y');
Save this function as "fitNewPeakShape" into the PeakFitter folder (or if you are using the single-function keypress version ipf.m, there is already a "fitNewPeakShape" rear the end of the code; just modify its name to suit).
Change FitAndPlot.m to add a 6th case to the switch
statement that starts at at line 20:
case 6 FitParameters=FMINSEARCH('fitNewPeakShape',start,options,x,y); ShapeString='NewPeakShape';
and at to the one that starts at line 52:
case 6 A(m,:)=NewPeakShape(x,FitParameters(2*m-1),FitParameters(2*m));
Note:
If your function requires three adjustable arguments rather than two,
you can assign one to the Extra slider, as is done here for the
exponentially-broadened Gaussian and Pearson-type shape. In that case,
add "extra" as the last argument of the function calls above, as is
done in case 4 and 5 of the two switch statements in FitAndPlot.m.
Change this slider's range to suit your proposes in lines 47-48 of
InteractivePeakFitter.m.
Change ipfshape.m to add a 6th case to the switch
statement:
case 6 ShapeString='NewPeakShape';
Change ipfpeaks.m to add a 6th case to the switch
statement:
case 6 ShapeString='NewPeakShape';
Assign a key to the new peak shape and change
KeyPressTest.m to add a case to the switch statement in KeyPressTest to
interpret this keypress, similar to the other shapes.
Change line 39 in InteractivePeakFitter.m to read
MaxShapes=6.
ZIP file
containing all of the above Interactive Peak Fitter functions and demos.
Tom O'Haver
Professor Emeritus
Department of Chemistry and Biochemistry
The University of Maryland at College Park toh@umd.edu http://www.wam.umd.edu/~toh