index previous next

Interactive Peak Fitter (May 2009)

A Matlab peak fitting program for time-series signals, which uses an unconstrained non-linear optimization algorithm to decompose a complex, overlapping-peak signal into its component parts. The objective is to determine whether your signal can be represented as the sum of fundamental underlying peaks shapes. Accepts signals of any length, including those with non-integer and non-uniform x-values. Fits groups of peaks of Gaussian, Lorentzian, Logistic, Pearson, and exponentially-broadened Gaussian shapes (expandable to other shapes). There are three different versions, a command line version, a keypress operated interactive version, and a version with mouse-controlled sliders (which requires Matlab 6.5).   

Command line function: peakfit.m

Peakfit.m is a user-defined command window peak fitting function, usable from a remote terminal.  It is written as a self-contained Matlab function in a single m-file. (To view of download, click Peakfit.m). It takes data in the form of a 2 x n matrix that has x values in row 1 and Y values in row 2.  Several optional additional arguments may be added:

peakfit2small.GIFPEAKFIT(signal);
Performs an iterative least-squares fit of a single Gaussian peak to the data matrix "signal".
 
PEAKFIT(signal,center,window);
Fits a single Gaussian peak to a portion of the matrix "signal". The portion is centered on the x-value "center" and has width "window" (in x units).
 
PEAKFIT(signal,center,window,NumPeaks);
"NumPeaks" = number of peaks in the model (default is 1 if not  specified).
 
PEAKFIT(signal,center,window,NumPeaks,peakshape);
Specifies the peak shape of the model: "peakshape" = 1-5.  (1=Gaussian (default), 2=Lorentzian, 3=logistic, 4=Pearson, and   5=exponentially broadened Gaussian)
 
PEAKFIT(signal,center,window,NumPeaks,peakshape,extra)
Specifies the value of 'extra', used in the Pearson and the exponentially broadened Gaussian shapes to fine-tune the peak shape.
 
PEAKFIT(signal,center,window,NumPeaks,peakshape,extra,NumTrials);
Performs "NumTrials" trial fits and selects the best one (with lowest  fitting error). NumTrials can be any positive integer (default is 1).
 
PEAKFIT(signal,center,window,NumPeaks,peakshape,extra,NumTrials,start)
Specifies the first guesses vector "firstguess" for the peak positions  and widths, e.g. start=[position1 width1 position2 width2 ...]
 
[FitResults,MeanFitError]=PEAKFIT(signal,center,window...)
Returns the FitResults vector in the order peak number, peak  position, peak height, peak width, and peak area), and the MeanFitError (the percent RMS difference between the data and the model in the  selected segment of that data) of the best fit.
 
Example 1: Peakfit1small.gif
 >> x=[0:.1:10];y=exp(-(x-5).^2);peakfit([x' y'])
 
Fits exp(-x)^2 with a single Gaussian peak model. Shown on the right. Note that the peak area reported, 1.7724, is equal to the theoretical expectation (the square root of  Pi).
 
Example 2:
  >> x=[0:.005:1];y=humps(x);peakfit([x' y'],.3,.7,1,4,3);
   
Fits a portion of the humps function, 0.7 units wide and centered on   x=0.3, with a single (NumPeaks=1) Pearson function (peakshape=4) with extra=3 (controls shape of Pearson function).

Example 3:
  >> x=[0:.005:1];y=(humps(x)+humps(x-.13)).^3;smatrix=[x' y'];
  >> [FitResults,MeanFitError]=peakfit(smatrix,.4,.7,2,1,0,10)

  
Creates a data matrix 'smatrix', fits a portion to a two-peak Gaussian  model, takes the best of 10 trials.  Returns FitResults and MeanFitError.
   FitResults =
               1       0.4128  3.1114e+008     0.10448   3.4605e+007
               2       0.3161  2.8671e+008     0.098862  3.0174e+007
   MeanFitError =
         0.68048

 
Example 4:
 >> peakfit([x' y'],.4,.7,2,1,0,10,[.3 .1 .5 .1]);
  As above, but specifies the first-guess position and width of the two peaks, in the order [position1 width1 position2 width2]

Interactive keypress-operated version: ipf.m

ipf.m is a version of the Interactive Peak Fitter that uses keyboard commands and the mouse cursor.  It is written as a self-contained Matlab function, in a single m-file. The syntax is ipf(x,y), where x and y are the independent and dependent variables of your data set, OR ipf(datamatrix) where "datamatrix" is a matrix that has x values in row 1 and y values in row 2.  Demoipf.m is a demonstration script for ipf.m, with a built-in simulated signal generator. To download these m-files, right-click on these links, select Save Link As..., and click Save.

In this example, pan and zoom controls are used to isolate a segment of the signal that contains three weak, noisy peaks The lower plot shows the whole signal and the upper plot shows the segment.  Only the peaks in that segment are subjected to the fit.


Pan and zoom are adjusted so that  the signal returns to the local baseline at the ends of the segment, allowing the program to compute and subtract a linear baseline between those points.  Pressing 3, G, F performs a 3-peak Gaussian fit.

Keyboard Controls:

Pan left/right: Coarse pan: < and >          
                Fine pan: left/right arrows

Zoom in/out:    Coarse zoom: ? and "  
     
                Fine zoom: up/down arrows

Select # of peaks:  Number keys 1-5

Select peak shape:   G Gaussian
                     L Lorentzian
         
                     O Logistic

                     P Pearson

Perform one Fit:     F

Set baseline:        B

Custom starts:       C

Adjust 'extra':      A,Z

Print fit info:      Q

Print results only:  R

Print out data:      D
Prints commands      K
Autozero off/on      T
Best of 10 fits
     X

Operating instructions for ipf.m

  1. At the command line, type ipf(x, y), (x = independent variable, y = dependent variable) or  ipf(datamatrix) where "datamatrix" is a matrix that has x values in row 1 and y values in row 2. 
  2. Use the Pan ( < and >) and Zoom (? and ") keys  to isolate the peak or group of peaks that you want to fit. (You can fine-tune the Pan and Zoom by using the cursor arrow keys on the keyboard). The curve fitting operation applies only to the segment of the signal shown in the top plot. The bottom plot shows the entire signal.  Try not to get any undesired peaks in the upper window or the program will try to fit them.
  3. Press the 1,2,3,4 or 5 keys to choose the number of model peaks, that is, the minimum number of peaks that you think might suffice to fit this segment of the signal.
  4. Press the G,L,O,P, or E keys to select the desired model peak shape. In this version there are five choices: Gaussian (G), Lorentzian (L), Logistic (O), Pearson (P), and Exponentially-broadened Gaussian (E).
  5. A set of vertical dashed lines are shown on the plot, one for each model peak. Try to fine-tune the Pan and Zoom keys so that the signal goes to the baseline at both ends of the upper plot and so that the peaks (or bumps) in the signal roughly line up with the vertical dashed lines, as show in the figures on the left. This does not have to be exact.
  6. Press F to initiate the curve-fitting calculation. Each time press F, another fit of the selected model to the data is performed with slightly different starting positions, so that you can judge the stability of the fit. Keep your eye on the residuals plot and on the "Error %" display. Do this several times, trying for the lowest error and the most unstructured random residuals plot. At any time, you can refine the baseline position (step 8), change the number or peaks (step 3), or peak shape (step 4).  To have the function automatically perform 10 trials fits and take the one with the lowest error, press the X key.
  7. The model parameters of the last fit are displayed in the upper window. For example, for a 3-peak fit:
    1      5.33329      14.8274     0.262253      4.13361
    2 5.80253 26.825 0.326065 9.31117
    3 6.27707 22.1461 0.249248 5.87425
    The column are, left to right: the peak number, peak position, peak height, peak width, and the peak area. (In version 2.2 and later, the area of each component peak within the upper window is computed using the trapezoidal method and displayed after the width). Peaks are numbered from left to right. Pressing Q prints out a brief report of results in the command window, like so:

    Peak Shape = Gaussian
    Number of peaks = 3
    Fitted range = 5 - 6.64
    Percent Error = 7.4514
    Peak#  Position      Height       Width        Area
    1      5.33329      14.8274     0.262253      4.13361
    2      5.80253       26.825     0.326065      9.31117
    3      6.27707      22.1461     0.249248      5.87425

    Pressing R prints out only the peak table (and the error as the 6th column) - useful for keeping a record of successive trial fits.  
  8. To turn ON and OFF the auto-zero mode, press the T key.  When auto-zero is ON, a straight-line baseline connection the two ends of the signal segment in the upper panel will be automatically subtracted. If you prefer to set the baseline manually, press the B key, then click on the baseline to the LEFT the peak(s), then click on the baseline to the RIGHT the peak(s). The new baseline will be subtracted and the fit re-calculated. (The new baseline remains in effect until you use the pan or zoom controls).
  9. If you want to manually specify the first-guess peak positions, press C, then click on your estimates of the peak positions in the upper graph, once for each peak. A fit is automatically performed after the last click. Peaks are numbered in the order clicked.  (The custom start positions remain in effect until you change the number of peaks or use the pan or zoom controls).
  10. The A and Z keys is used only if you are using the Pearson or the exponentially-broadened Gaussian (ExpGaussian) shapes. It controls the extra shape factor of the Pearson (a value of 1.0 gives a Lorentzian shape, a value of 2.0 gives a shape roughly half-way between a Lorentzian and a Gaussian, and a larger values give a nearly Gaussian shape) and it also controls the time constant of the exponentially broadened Gaussian. You can either adjust this to minimize the Error % or set it to a previously-determined value.

Demoipf.m

 Demoipf.m is a demonstration script for ipf.m, with a built-in simulated signal generator. The true values of the simulated peak positions, heights, and widths are displayed in the Matlab command window, for comparison to the FitResults obtained by peak fitting. The default simulated signal contains six independent groups of peaks that you can use for practice: a triplet near x = 150, a singlet at 400, a doublet near 600, a triplet near 850, and two broad single peaks at 1200 and 1700.  Run this demo and see how close to the actual true peak parameters you can get. The useful thing about a simulation like this is that you can get a feel for the accuracy of peak parameter measurements, that is, the difference between the true and measured values of peak parameters.  

Interactive version with mouse-controlled sliders: InteractivePeakFitter.m

Interactive Peak Fitter script for your own signal. Requires Matlab 6.5. Click here to download the ZIP file "PeakFitter.zip" that also includes supporting functions and a self-contained demo to show how it works. You can also download it from the Matlab File Exchange.

Click to view enlarged figure

The opening screen view shows the entire signal below and the zoomed-in segment above. Pan and zoom the signal to select the desired group of peaks to be fit, adjusting so that the signal goes to the baseline at both ends of the upper plot and so that the peaks in the signal roughly line up with the vertical dashed lines.

Click to view enlarged figure

After selecting a speak shape and number of of peaks, press the
F key to perform a fit.  A table of fit results are shown in the upper window, with the component  peaks shown in green. The baseline can be subtracted automatically or manually.

Click to view enlarged figure

The bottom graph shows the residuals and standard deviation of the error.  Pressing F again repeats the fit with slightly different start positions.  Adjusting the zoom pan and zoom settings, refining the background subtraction, and using custom start positions may help reduce the fitting error.
 

Click to view enlarged figure

This figure shows a peak fit with an
exponentially-broadened Gaussian shape.  The time constant of the exponential is controlled by the "Extra" slider. Other available peak shapes include Gaussian ('g'), Lorentzian ('l'), Logistic ('o'), and Pearson ('p'). Other shapes can be added if you know the mathematical form.


Operating Instructions for InteractivePeakFitter.m

  1. Load your data set into the vectors x and y (x = independent variable, y = dependent variable) and then execute InteractivePeakFitter.m.
  2. Use the Pan and Zoom sliders on the left and right to isolate the peak or group of peaks that you want to fit. You can fine-tune the Pan and Zoom by using the cursor arrow keys on the keyboard. The curve fitting operation applies only to the segment of the signal shown in the top plot. (The bottom plot shows the entire signal).  Try not to get any undesired peaks in the upper window or the program will try to fit them.
  3. Use the # Peaks slider (or press the 1,2,3,4 or 5 keys) to choose the number of model peaks, that is, the minimum number of peaks that you think might suffice to fit this segment of the signal.
  4. Use the Shape slider or press the G,L,O,P, or E keys to select the desired model peak shape. In this version there are five choices: Gaussian ('g'), Lorentzian ('l'), Logistic ('o'), Pearson ('p'), and Exponentionally-broadened gaussian ('e').
  5. A set of vertical dashed lines are shown on the plot, one for each model peak. Try to fine-tune the Pan and Zoom using the sliders or the cursor arrow keys so that the signal goes to the baseline at both ends of the upper plot and so that the peaks (or bumps) in the signal roughly line up with the vertical dashed lines, as show in the figures on the left. This does not have to be exact.
  6. Click on the Re-fit slider or press 'f' to initiate the curve-fitting calculation. Each time you click on this slider, another fit of the selected model to the data is performed. Keep your eye on the residuals plot and on the "Error %" display. Do this several times, trying for the lowest error and the most unstructured random residuals plot. At any time, you can refine the baseline position (step 8), change the number or peaks (step 3), or peak shape (step 4).
  7. The model parameters of the last fit are displayed in the upper window. For example, for a 3-peak fit:
    1      5.33329      14.8274     0.262253      4.13361
    2 5.80253 26.825 0.326065 9.31117
    3 6.27707 22.1461 0.249248 5.87425
    The column are, left to right: the peak number, peak position, peak height, peak width, and peak area. (In version 2.2 and later, the area of each component peak within the upper window is computed using the trapezoidal method and displayed after the width). Peaks are numbered from left to right. To print this out in the Matlab command window, type "FitResults" in the Matlab command window and press Enter. FitResults(:,2) lists the peak positions, FitResults(:,3) lists the peak heights, FitResults(:,4) lists the peak widths, and FitResults(:,5) lists the peak areas falling within the upper window.
  8. If you wish to set the baseline manually, press the 'B' key or click the BG button, then click on the baseline to the LEFT the peak(s), then click on the baseline to the RIGHT the peak(s). The new baseline will be subtracted and the fit re-calculated. (The new baseline remains in effect until you use the pan or zoom controls to change the fitting region; then you have to re-do the baseline for that new region).
  9. If you want to manually specify the first-guess peak positions, click the Custom button or press 'c', then click on your estimates of the peak positions in the upper graph, once for each peak. A fit is automatically performed after the last click. Peaks are numbered in the order clicked.  (The custom start positions remain in effect until you change the number of peaks or use the pan or zoom controls).
  10. The Extra slider (or the A and Z keys, in the keyboard-operated version  ipf.m) is used only if you are using the Pearson or the exponentially-broadened Gaussian (ExpGaussian) shapes. It controls the extra shape factor of the Pearson (a value of 1.0 gives a Lorentzian shape, a value of 2.0 gives a shape roughly half-way between a Lorentzian and a Gaussian, and a larger values give a nearly Gaussian shape) and it also controls the time constant of the exponentially broadened gaussian. You can either adjust this to minimize the Error % or set it to a previously-determined value. You can change this slider's range to suit your proposes in lines 47-48 of InteractivePeakFitter.m.

DemoInteractivePeakFitter.m

Demonstration script for the Interactive Peak Fitter with a built-in simulated signal generator. Requires Matlab 6.5. The true values of the simulated peak positions, heights, and widths are displayed in the Matlab command window, for comparison to the FitResults obtained by peak fitting.
>> ColumnLabels =
Peak# Positon Height Width
ActualParameters =
1 100 1 30
2 130 2 40
3 200 3 50
4 400 1 30
5 600 4 20
6 630 1 20
7 800 3 30
8 850 2 40
9 900 1 50
10 1200 2 100
The default simulated signal contains six independent groups of peaks that you can use for practice: a triplet at x = 150, a singlet at 400, a doublet near 600, a triplet at 850, and two broad single peaks at 1200 and 1700. You can change the character of the simulated signal in lines 23-28.

Notes for all versions

Note 1: When a fit is performed, the lower half of the figure window displays the residuals (differences between the model and the data).  The "Error" or "Percent Error" reported by this program is the RMS difference between the best-fit model and the raw data over the fitted segment.

Note 2: When the number of peaks is 2 or greater, the green lines displayed in the upper plot after a fit is performed are the individual model peaks; the red line is the total model, i.e. the sum of the components, which is a least-squares best-fit to the blue data points.  

Note 3: In the slider version, if the range of the sliders is inappropriate for your signal, you can adjust the slider ranges in lines 45-52 of InteractivePeakFitter.m.

Note 4: The Extra slider (or the A and Z keys, in the keypress-operated version ipf.m) is used to manually control the global variable "extra", which could be used for a variety of purposes.  If you add your own peak shape (see "How to add a new peak shape", below) , you can use the extra variable for any purpose, is you need another adjustable parameter.

Note 5: By program automatically subtracts a linear baseline from the segment of signal displayed in the upper window. The baseline is calculated by interpolating from the average of the first 1/10th of the points in the segment to the average of  the last 1/10th of the points. Try to adjust the pan and zoom to include some of the baseline at the beginning and end of the segment in the upper window, allowing the automatic baseline subtract gets a good reading of the baseline  If that's not possible, you can still refine the baseline subtraction manually if you wish, by using the the BG button or press 'B' as described in step 8, above. To disable the auto-zero operation, press T.  

Note 6: The peak areas are calculated only for the portion of the component peaks falling within the upper window.  If some of the component peaks are close enough to the edges of the window that they tail off outside the window, their peak areas will be inaccurate.

Note 7:  The "Error" or "Percent Error" reported by this program is simply the RMS difference between the best-fit model and the raw data over the fitted segment. It is not the error in the parameters derived from the fit (peak position, height, and width).  The accuracy of those parameters (the difference between the true and measured values of peak parameters) is much harder to estimate, because of course you don't know the true values of the peak parameters. However, the precision of measurement can be estimated if you can obtain several repeat samples of the same signal (with independent random noise), fit each sample using the same procedure, and calculate the standard deviation of the parameters from all those fits.  If the random errors arising from random noise in the signal or random variations is the accuracy of background subtraction are the dominant sources of error, then an estimate of the precision is a good indication of the accuracy of parameter measurement.  More on accuracy and precision of peak parameter measurement.



Hints and Tips

  1. It's best not to smooth your data prior to curve fitting.  Smoothing can distort the signal shape and the noise distribution, making it harder to evaluate the fit by visual inspection of the residuals plot.  Smoothing your data beforehand makes it impossible to achieve the goal of obtaining a random unstructured residuals plot and it increases the chance that you will "fit the noise" rather that the actual signal.
  2. The most important factor in non-linear iterative curve fitting is selecting the underlying model peak function, for example, Gaussian, Lorentzian, etc. (Some examples of this). It's worth spending some time finding and verifying a suitable function for your data. (See the instructions below for adding a new peak shape to the program).
  3. You should always use the minimum number of peaks that adequately fits your data. (Some examples of this). Using too many peaks will result in an unstable fit - the green lines in the upper plot, representing the individual component peaks, will bounce around wildly as the Re-fit slider is moved, without significantly reducing the Error %.  
  4. This program uses an iterative non-linear search function (modified Simplex) to determine the peak positions and widths that best match the data. This requires first guesses for the peak positions and widths. (The peak heights don't require first guesses, because they are linear parameters; the program determines them by linear regression). The default first guesses for the peak positions are made by the computer on the basis of the pan and zoom slider settings and are indicated by the magenta vertical dashed lines. The first guesses for the peak widths are computed from the Zoom setting, so the best results will be obtained if you zoom in so that the particular group of peaks is isolated and spread out as suggested by the peak position markers (vertical dashed lines). 
  5. If the peak components are very unevenly spaced, you might be better off entering the first-guess peak positions yourself by pressing the C key (clicking on the Custom slider) and then clicking on the top graph where you think the peaks might be. None of this has to be exact - they're just first guesses, but if they are too far off it can throw the search algorithm off.
  6. Each time you perform a repeat fit (e.g. pressing the F key), the program adds small random deviations to the first guesses, in order to determine whether an improved fit might be obtained with slightly different first guesses. This is useful for determining the robustness or stability of the fit. If the error and the values of the peak parameters vary slightly in a tight region, this means that you have a robust fit (for that number of peaks). If the error and the values of the peak parameters bounces around wildly, it means the fit is not robust (try changing the number of peaks, peak shape, and the pan and zoom settings), or it may simply be that the data are not good enough to be fit with that model. The variability in the peak parameters from fit to fit is an estimate of the uncertainty caused by the curve fitting procedure (but not of the uncertainty caused by the noise in the data, because this is only for one sample of the data and noise).  Pressing F and R alternately will record the results of each successive fit in the command window.

How to add a new peak shape

  1. To add a new peak shape to the five initial shapes, you must have a function for that peak shape, similar to gaussian.m, that takes as arguments a vector of x-values, a peak-center value, and a peak width, and returns a vector of that function evaluated at each value of x. Let's call it NewPeakShape. Add NewPeakShape to the PeakFitter folder (or if you are using the keypress version ipf.m, there is already a "NewPeakShape" function at the end of the code; just replace "g=sin(a.*x+b)" with your own function of x,a, and b, and return the value in g).
  2. Create a fitting function for this peak shape, similar to fitgaussian.m.
    function err = fitNewPeakShape(lambda,t,y)
    % Fitting function for NewPeakShape.
    global c
    A = zeros(length(t),round(length(lambda)/2));
    for j = 1:length(lambda)/2,
    A(:,j) = NewPeakShape(t,lambda(2*j-1),lambda(2*j))';
    end
    c = A\y';
    z = A*c;
    err = norm(z-y');
    Save this function as "fitNewPeakShape" into the PeakFitter folder (or if you are using the single-function keypress version ipf.m, there is already a "fitNewPeakShape" rear the end of the code; just modify its name to suit).
  3. Change FitAndPlot.m to add a 6th case to the switch statement that starts at at line 20:
     case 6
    FitParameters=FMINSEARCH('fitNewPeakShape',start,options,x,y);
    ShapeString='NewPeakShape';
    and at to the one that starts at line 52:
     case 6
    A(m,:)=NewPeakShape(x,FitParameters(2*m-1),FitParameters(2*m));
    Note: If your function requires three adjustable arguments rather than two, you can assign one to the Extra slider, as is done here for the exponentially-broadened Gaussian and Pearson-type shape. In that case, add "extra" as the last argument of the function calls above, as is done in case 4 and 5 of the two switch statements in FitAndPlot.m. Change this slider's range to suit your proposes in lines 47-48 of InteractivePeakFitter.m.
  4. Change ipfshape.m to add a 6th case to the switch statement:
     case 6
    ShapeString='NewPeakShape';
  5. Change ipfpeaks.m to add a 6th case to the switch statement:
     case 6
    ShapeString='NewPeakShape';
  6. Assign a key to the new peak shape and change KeyPressTest.m to add a case to the switch statement in KeyPressTest to interpret this keypress, similar to the other shapes.
  7. Change line 39 in InteractivePeakFitter.m to read MaxShapes=6.

ZIP file containing all of the above Interactive Peak Fitter functions and demos.


index previous next
Tom O'Haver
Professor Emeritus
Department of Chemistry and Biochemistry
The University of Maryland at College Park
toh@umd.edu
http://www.wam.umd.edu/~toh
Document made with KompoZer

Number of unique visits since May 17, 2008:

counter
create hit