A common requirement
in scientific data processing is to detect peaks in a signal and
to measure their positions, heights, and widths. One way to do
this is to make use of the fact that the first derivative of a peak has a
downward-going zero-crossing
at the peak maximum. But the presence of random noise in real
experimental signal will cause many false zero-crossing simply due
to the noise. To avoid this problem, the technique described here
first smooths the first derivative
of the signal, before looking for downward-going zero-crossings,
and then it takes only those zero crossings whose slope exceeds a
certain predetermined minimum (called the "slope threshold")
at a point where the original signal exceeds a certain minimum
(called the "amplitude threshold"). By carefully adjusting
the smoothwidth, slope threshold, and amplitude threshold, it's
possible to detect only the desired peaks and ignore peaks that
are too small, too wide, or too narrow. Moreover, because
smoothing can distort peak signals, reducing peak heights and
increasing peak widths, this technique estimates the position,
height, and width of each peak by least-squares
curve-fitting of a segment of the original unsmoothed signal in the vicinity of
the zero-crossing. Thus, even if heavy smoothing of the first
derivative is necessary to provide reliable discrimination against
noise peaks, the peak parameters extracted by curve fitting are
not distorted. (This is useful primarily for signals that have
several data points in each peak, not for spikes that have only
one or two points). This technique is capable of measuring peak
positions and heights quite accurately, but the measurements of
peak widths and areas is most accurate if the peaks are Gaussian
in shape (or Lorentzian, in the
variant findpeaksL). For the most accurate measurement of
other peak shapes, or of highly overlapped peaks, the related
functions findpeaksb.m,
findpeaksb3.m, findpeaksfit.m
utilize non-linear iterative curve
fitting with selectable peak shape models.
The routine is now available in several different versions:
function P=findpeaksG(x, y, SlopeThreshold, AmpThreshold,
SmoothWidth, FitWidth, smoothtype)
This is my basic command-line function to locate and measure the
positive peaks in a noisy data sets. It detects peaks by looking
for downward zero-crossings in the smoothed first derivative that
exceed SlopeThreshold and peak amplitudes that exceed
AmpThreshold, and determines the position, height, and approximate
width of each peak by least-squares curve-fitting the top part of
the peak. Returns a list (in matrix P) containing the peak
number and the estimated position, height, width, and area of each
peak. It can find and measure over 1000 peaks per
second in very large signals. Version 5.0 has
improved error catching for very noisy data.
The data are passed to the findpeaksG function in the
vectors x and y (x = independent variable, y = dependent
variable). The other parameters are user-adjustable:
SlopeThreshold - Slope of the smoothed
first-derivative that is taken to indicate a peak. This
discriminates on the basis of peak width. Larger values of this
parameter will neglect broad features of the signal. A
reasonable initial value for Gaussian peaks is
0.7*WidthPoints^-2, where WidthPoints is the number of data
points in the half-width of the peak. AmpThreshold - Discriminates on the basis of peak height.
Any peaks with height less than this value are ignored. SmoothWidth - Width of the smooth function that is
applied to data before the slope is measured. Larger values of
SmoothWidth will neglect small, sharp features. A reasonable
value is typically about equal to 1/2 of the number of data
points in the half-width of the peaks. FitWidth - The number of points around the "top part" of
the (unsmoothed) peak that are taken to estimate the peak
heights, positions, and widths. A reasonable value is typically
about equal to 1/2 of the number of data points in the
half-width of the peaks. The minimum value is 3. Smoothtype (added
in Version 4, Sept. 2011) determines the smoothing algorithm
(see http://terpconnect.umd.edu/~toh/spectrum/Smoothing.html)
If smoothtype=1, rectangular (sliding-average
or boxcar)
If smoothtype=2, triangular (2 passes of
sliding-average)
If smoothtype=3, pseudo-Gaussian (3 passes of
sliding-average)
Basically, higher values yield greater reduction in
high-frequency noise, at the expense of slower execution. For a
comparison of these smoothing types, see SmoothingComparison.html.
Optimization of peak finding The
optimum values of the input arguments for findpeaksG and related
functions depend on the signal and on which features of the signal
are important for your work. Rough values for these parameters can
be estimated based on the width of the peaks that you wish to
detect, as described above, but in many cases it will be best to
fine-tune these parameters for your particular signal. The most
convenient way to do that is to use the interactive peak
detector iPeak (described below),
which allows you to adjust all of these parameters by simple
keypresses and displays the results graphically and instantly.
In the example shown on the left (using the interactive peak
detector iPeak program described below), the
important parts of the signal are two broad peaks at
x=4 and x=6, the second one half the height of the
first. The small jagged features are just random noise. We want to
detect the two peaks but ignore the noise. (The detected peaks are
numbered 1,2,3,...in the lower panel of this graphic). This is
what it looks like if the AmpThreshold is too small or too large, if the
SlopeThreshold is too small or too large, if
the SmoothWidth is too small or too large, if
the FitWidth is too small or too large. If
these parameters are within the optimum range for this measurement
objective, the findpeaksG functions will return something like
this (although the exact values will vary with the noise and with
the value of FitWidth):
Peak#
Position Height
Width
Area
1
3.9649
0.99919
1.8237
1.94
2
5.8675
0.53817
1.6671 0.955
How
is 'findpeaksG' different from 'max' in Matlab or 'findpeaks' in
the Signal Processing Toolkit?
The 'max' function simply returns the largest single
value in a vector. Findpeaks in the Signal
Processing Toolbox can be used to find the values and
indices of all the peaks in a vector that are higher than a
specified peak height and are separated from their neighbors by a
specified minimum distance. My version of findpeaks
(findpeaksG) accepts both an independent variable (x) and
dependent variable (y) vectors, finds the places where the average
curvature over a specified region is concave down, fits that
region with a least-squares fit, and returns the peak position (in
x units), height, width, and area, of any peak that exceeds a
specified height. For example, let's create a noisy series of
peaks (plotted on the right) and apply both of these
findpeaks functions to the resulting data.
Now, anyone just looking at this plot of data would count 16
peaks, with peak heights averaging about 10 units. Every time the
statements are run, the random noise is different, but you would
still count the 16 peaks, because the signal-to-noise ratio is not
bad. But the findpeaks function in the Signal Processing
Toolbox, [PKS,LOCS]=findpeaks(y,'MINPEAKHEIGHT',5,'MINPEAKDISTANCE',11)
counts anywhere from 11 to 20 peaks, with an average height (PKS) of 11.5.
In contrast, my findpeaksG function counts 16 peaks every time,
with an average
height of 10 ħ0.3. Much more reasonable.
findpeaksG(x,y,0.001,5,11,11,3)
It also measures the width and area, assuming the peaks are
Gaussian (or Lorentzian, in the variant findpeaksL). To be fair, findpeaks in the Signal Processing
Toolbox works better for peaks that have only 1-3 data
points on the peak; my function is better for peaks that
have more points.
findvalleys.
There is also a similar function for finding valleys (minima), called findvalleys.m, which works the same way
as findpeaksG.m, except that it locates minima instead of
maxima. Only valleys above (that is, more positive or less
negative than) the AmpThreshold are detected; if you wish to
detect valleys that have negative minima, then AmpThreshold must
be set more negative than that.
>>
x=[0:.01:50];y=cos(x);P=findvalleys(x,y,0,-1,5,5)
P =
1.0000 3.1416
-1.0000
2.3549 0
2.0000 9.4248
-1.0000
2.3549 0
3.0000 15.7080
-1.0000
2.3549 0
4.0000 21.9911
-1.0000
2.3549 0
5.0000 28.2743
-1.0000
2.3549 0
6.0000 34.5575
-1.0000
2.3549 0
7.0000 40.8407
-1.0000
2.3549 0
8.0000 47.1239
-1.0000
2.3549 0 The accuracy of the measurements of peak position,
height, width, and area by the findpeaksG function depends on the
shape of the peaks, the
extent of peak overlap, and signal-to-noise ratio. The width and
area measurements particularly are strongly influenced by peak
overlap, noise, and the choice of FitWidth. Isolated peaks of
Gaussian shape are measured most accurately. For peak of Lorentzian
shape, use findpeaksL.m
instead (the only difference is that the reported peak heights,
widths, and areas will be more accurate if the peak are actually
Lorentzian). See "ipeakdemo.m" below for an accuracy trial for
Gaussian peaks. For the most accurate measurements of highly
overlapping peaks of various shapes, use peakfit.m
or the Interactive
Peak Fitter (ipf.m).
For a direct comparison of the accuracy of findpeaksG vs
peakfit, run the script peakfitVSfindpeaks.m.
This script (figure on the left) generates four very noisy
peaks of different heights and widths, then applies
findpeaksG.m and peakfit.m to measure the peaks and
compares the results. The peaks detected by findpeaksG are
labeled "Peak 1", "Peak 2", etc. If you run this script
several times, you'll find that both methods work well
most of the time, with peakfit giving smaller errors in
most cases, but occasionally findpeaksG will miss the
first (lowest) peak and rarely it will detect an extra
peak that is not there if the signal is very noisy.
findpeaksb.mis a variant of findpeaksG.m that more accurately
measures peak parameters by using iterative least-square
curve fitting based on my peakfit.m function
This yields better peak parameter values that findpeaksG
alone, because it can be set for different peak shapes ,
because it fits the entire peak, not just the top
part, and because it has provision for background
subtraction (flat, linear or quadratic). This function
works best with isolated peaks that do not overlap. For
version 3, the syntax is P = findpeaksb(x,y,
SlopeThreshold, AmpThreshold, smoothwidth, peakgroup,
smoothtype, windowspan, PeakShape, extra, AUTOZERO). The first seven input
arguments are exactly the same as for the findpeaksG.m
function; if you have been using findpeaksG or iPeak to
find and measure peaks in your signals, you can use those
same input argument values for findpeaksb.m. The remaining
four input arguments of are for the peakfit function: "windowspan"
specifies the number of data points over which each peak
is fit to the model shape (it must be large enough to
cover the entire peak and get down to the background on
both sides of the peak); "PeakShape" specifies
the model peak shape: 1=Gaussian, 2=Lorentzian, etc (type
'help findpeaksb' for a list), "extra" is the
shape modifier variable that is used for the Voigt,
Pearson, exponentially broadened Gaussian and Lorentzian,
Gaussian/Lorentzian blend, and bifurcated Gaussian and
Lorentzian shapes to fine-tune the peak shape; "AUTOZERO"
is 0, 1, 2, or 3 for no, linear, quadratic, or flat
background subtraction.
The peak table returned by this function
has a 6th column listing the percent
fitting errors for each peak.
Here is a simple example with three Gaussians on a linear
background, comparing (a) plain findpeaksG,
to (b) findpeaksb without
background subtraction (AUTOZERO=0),
to (c) findpeaksb with background subtraction
(AUTOZERO=1).
x=1:.2:100;Heights=[1
2 3];Positions=[20 50 80];Widths=[3 3 3]; y=2-(x./50)+modelpeaks(x,3,1,Heights,Positions,Widths)+.02*randn(size(x));
plot(x,y); disp('
Peak
Position
Height
Width
Area % error') PlainFindpeaks=findpeaksG(x,y,.00005,.5,30,20,3) NoBackgroundSubtraction=findpeaksb(x,y,.00005,.5,30,20,3,150,1,0,0) LinearBackgroundSubtraction=findpeaksb(x,y,.00005,.5,30,20,3,150,1,0,1)
The demonstration script DemoFindPeaksb.m
shows how findpeaksb works with multiple
peaks on a curved background.
findpeaksb3.m is a variant of
findpeaksb.m that fits each detected peak along with the
previous and following peaks found by findpeaksG.m, so as to
deal better with overlap of the adjacent overlapping peaks. The
syntax is function FPB=findpeaksb3(x,y,
SlopeThreshold, AmpThreshold, smoothwidth, peakgroup,
smoothtype, PeakShape, extra, NumTrials, AUTOZERO, ShowPlots).
The demonstration script DemoFindPeaksb3.m shows
how findpeaksb3 works with multiple overlapping
Lorentzian peaks, as in the example on the left
(type "help findpeaksb3") for more.
The
function findpeaksplot.m
is a simple variant of findpeaksG.m that also plots the
x,y data and numbers the peaks on the graph (if any are
found). An example is shown on the right.
findpeaksplotL.m does the
same thing optimized for Lorentzian peak.
The function peakstats.m uses
the same algorithm as findpeaksG, but it computes and
returns a table of summary statistics of the peak
intervals (the x-axis interval between adjacent detected
peaks), heights, widths, and areas, listing the maximum,
minimum, average, and percent standard deviation of each,
and optionally plotting the x,y data with numbered peaks
in figure window 1, printing the table of peak statistics
in the command window, and plotting the histograms
of the peak intervals, heights, widths, and areas in the
four quadrants of figure window 2. Type "help peakstats".
The syntax is the same as findpeaksG, with the addition of
a 8th input argument to control the display and plotting.
Example:
x=[0:.1:1000];y=5+5.*cos(x)+randn(size(x));
PS=peakstats(x,y,0,-1,15,23,3,1);
Peak Summary Statistics
158 peaks detected
Interval
Height
Width
Area
Maximum
6.6195
10.8991
4.7749 47.3653
Minimum
5.9086
9.1022
2.4123 26.3111
Mean
6.2833
10.0183
3.2346 34.4027
% STD
1.7362
3.2154
13.028 10.9314
With the last input argument omitted or equal to
zero, the plotting and printing in the command window are
omitted; the numerical values of the peak statistics table
are returned as a 4x4 array, in the same order as the
example above.
FindpeaksE.m
is a variant of findpeaksG.m that additionally estimates
the percent relative fitting error of each peak (assuming
a Gaussian peak shape) and returns it in the 6th column of
the peak table.
Example:
Peak start and end. Defining
the "start" and "end" of the peak (the x-values where the peak
begins and ends) is a bit arbitrary because typical peak shapes
approach the baseline asymptotically far from the peak maximum.
You might define the peak start and end points as the x values
where the y value is some small fraction, say 1%, of the peak
height, but then the random noise on the baseline is likely to be
a large fraction of the signal amplitude at that point. Smoothing
to reduce noise is likely to distort and broaden peaks,
effectively changing their start and end points. One solution to
this dilemma is to fit each peak to a model shape, then calculate
the peak start and end from the model expression. That
minimizes the noise problem by fitting the data over the entire
peak, but it works only if the peaks can be accurately modeled by
available fitting programs. For example, Gaussian peaks can
be shown to reach a fraction a of the peak
height at x=pħsqrt(w^2 log(1/a))/(2
sqrt(log(2))) where p is the peak position and w
is the peak width. So, for example if a=.01, x=pħw*sqrt((log(2)+log(5))/(2
log(2)))=1.288784*w. Lorentzian peaks can
be shown to reach a fraction a of the peak height at x=pħsqrt[(w^2
- aw^2)/a]/2. If a=.01, x=pħ(3/2
sqrt(11)*w)=4.97493*w. The findpeaksG variants findpeaksGSS.m and findpeaksLSS.m, for Gaussian and
Lorentzian peaks respectively, compute the peak start and end
positions in this manner and return them in the 6th and 7th
columns of the peak table P. (For greater accuracy with
overlapping peaks, use peakfit.m
or the Interactive Peak Fitter (ipf.m)
and calculate the start and end from the peak positions and width
using the
formulas above).
findpeaksT.m
and findpeaksTplot.m are
variants of findpeaksG that measure the peak parameters by constructing
a triangle around each peak with its sides tangent to the
sides of the peak, as shown on the right (script that generated this graphic).
This method mimics the geometric construction method that was
formerly used to measure peak parameters manually. Peak height is
taken as the apex of the triangle, which is slightly higher than
the peak of the underlying curve. The performance of this method
is poor when the signals are very noisy or if the peaks overlap,
but in some cases the triangle construction method can be more
accurate for the measurement of peak area than the Gaussian method
if the peaks are asymmetric and of uncertain shape (see the demo
function triangulationdemo.m
for two examples).
findpeaksfit.m is essentially a serial
combination of findpeaksG.m and peakfit.m. It uses the
number of peaks found and the peak positions and widths determined
by findpeaksG as input for the peakfit.m function, which then fits
the entire signal with the specified peak model. This
combination yields better values that findpeaksG alone, because
peakfit fits the entire peak, not just the top part, and it deals
with non-Gaussian and overlapped peaks. However, it fits only
those peaks that are found by findpeaksG. The syntax is
function [P,FitResults,LowestError,BestStart,xi,yi] =
findpeaksfit(x, y, SlopeThreshold, AmpThreshold, smoothwidth,
peakgroup, smoothtype, peakshape, extra, NumTrials, autozero,
fixedparameters, plots)
The first seven input arguments are exactly the same as for the
findpeaksG.m
function; if you have been using findpeaksG or iPeak to find and
measure peaks in your signals, you can use those same input
argument values for findpeaksfit.m. The remaining six input
arguments of findpeaksfit.m are for the peakfit function; if you
have been using peakfit.m or ipf.m
to fit the peaks in your signals, then you can use those same
input argument values for findpeaksfit.m. Type "help findpeaksfit"
for more information.
Note: findpeaksfit.m differs from findpeaksb.m
in that findpeaksfit.m
fits all the found peaks at one time with a single
multi-peak model, whereas findpeaksb.mfits each peak separately with a single-peak model (or each
detected peak along with the previous and following peaks
for findpeaksb3.m).
As a result, findpeaksfit.m
works better with a relatively small number of peak that all
overlap, whereas findpeaksb.m
works better with a large number of isolated non-overlapping
peaks, and findpeaksb3.m
works for large numbers of peaks that overlap at most one or
two adjacent peaks.
findsteps.m, syntax:
P=findpulses(x, y, SlopeThreshold, AmpThreshold, SmoothWidth,
peakgroup), locates positive transient steps in noisy x-y time
series data, by computing the first derivative of y that exceed
SlopeThreshold, computes the step height as the difference between
the maximum and minimum y values over a number of data point equal
to "Peakgroup". Returns list (P) with step number, x position, y
position, and the step height of each step detected;
"SlopeThreshold" and "AmpThreshold" control step sensitivity;
higher values will neglect smaller features.
Increasing "SmoothWidth" reduces small sharp false steps caused by
random noise or by "glitches" in the data acquisition. See findsteps.png
for a real example. And findstepsplot.m
plots the signal and numbers the peaks. Rectangular pulses.
Rectangular pulses (square waves) require a different approach,
based on amplitude discrimination rather than differentiation. The
function "findsquarepulse.m"
(syntax S=findsquarepulse(t,y,threshold) locates the
rectangular pulses in the signal t,y that exceed a y-value of
"threshold" and determines their start time, average height
(relative to the baseline) and width. DemoFindsquare.m creates a test signal
(with a true height of 2636 and a height of 750) and calls findsquarepulse.m
to demonstrate. If the signal is very noisy, some preliminary
rectangular smoothing (e.g. using fastsmooth.m) before calling
findsquarepulse.m may be helpful to eliminate false peaks.
Using the peak table. All these peak
finding functions return the peak table as a matrix, with one row
for each peak detected and with several columns listing the peak
number, position, height, width, and area in columns 1 - 5 (with
additional columns included for the variants findpeaksnr.m, findpeaksGSS.m, and findpeaksLSS.m). You
can assign this matrix to a variable (e.g. P, in the
examples above) and then use Matlab/Octave notation
and built-in functions to extract specific information from that
matrix. The powerful combination of Matlab/Octave functions and
matrix/vector notation allows you to construct compact expressions
that extract the specific information you need. For example:
[P(:,2) P(:,3)]is the time
series of peak heights (peak position in the first column and peak
height in the second column) mean(P(:,3))
returns the average peak height of all peaks (because
peak height is in column 3). Also works with median.
max(P(:,3))
returns the maximum peak height. Also
works with min. hist(P(:,3))
displays the histogram of peak heights std(P(:,4))./mean(P(:,4))
returns the relative standard deviation of the peak widths (column
4) P(:,3)./max(P(:,3)) returns the
ratio of each peak height (column 3)
to the height of the highest peak detected. 100.*P(:,5)./sum(P(:,5)) returns the percentage
of each peak area (column 5)
of the total area of all peaks detected.
sortrows(P,3) sorts P by peak
height; sortrows(P,4) sorts P by peak
width (smallest to largest) To create "d" as
the vector of x-axis (position) differences between adjacent
peaks (because peak position is in column 2).
for n=1:length(P)-1;
d(n)=max(P(n+1,2)-P(n,2)); end
Using my val2ind.m
function (download this function and place in the Matlab path): val2ind(P(:,3),7.5)
returns the peak number of the peak whose height (column
3) is closest to 7.5 P(val2ind(P(:,2),7.5),3) returns
the peak height (column 3) of the peak whose position (column
2) is closest to 7.5 P(val2ind(P(:,3),max(P(:,3))),:) returns the row
vector of peak parameters of the highest peak in peak
table P.
DemoFindPeak.m is a
simple demonstration script using the findpeaksG function on noisy
synthetic data. Numbers the peaks and prints out the peak
table in the Matlab command window:
DemoFindPeakSNR is a variant of
DemoFindPeak.m that uses findpeaksnr.m
to compute the signal-to-noise ratio (SNR) of each
peak and returns it in the 5th column.
DemoFindPeaksb.m
is a similar demonstration script that uses the findpeaksb function on noisy
synthetic data consisting of several Gaussian peaks
superimposed on a curved background. (The findpeaksG
function would not give accurate measurements of peak
height, width, and area for this signal, because it does
not correct for the background). Peak # Position
Height Width
Area Fitting error Measuredpeaks = 1
599.13 1.0059
49.533 53.044 9.6798 2
799.44 1.0206
45.073 48.974 8.6772 3
1000
3.9501 49.221
206.99 2.2302
4
1200.8 1.9413
49.133 101.54 5.195
5
1599.8 3.9979
49.138 209.14 2.4324
.....
Peak Identification
The command line function idpeaks.m is
used for identifying peaks according to their x-axis maximum
positions. The syntax is
It finds peaks in the signal "DataMatrix" (x-values in column 1
and y-values in column 2), according to the peak detection
parameters "AmpT", "SlopeT", "SmoothWidth", "FitWidth" (see the
"findpeaksG" function above), then compares the found peak
positions (x-values) to a database of known peaks, in the form of
an array of known peak maximum positions ('Positions') and
matching cell array of names ('Names'). If the position of a peak
found in the signal is closer to one of the
known peaks by less than the specified maximum error ('maxerror'),
that peak is considered a match and its peak position, name,
error, and peak amplitude (height) are entered into the output
cell array "IdentifiedPeaks". The full list of detected peaks,
identified or not, is returned in "AllPeaks". Use "cell2mat" to
access numeric elements of IdentifiedPeaks,e.g. cell2mat(IdentifiedPeaks(2,1))
returns the position of the first identified peak, cell2mat(IdentifiedPeaks(2,2))
returns its name, etc. Obviously for your own applications, it's up to you to
provide your own array of known peak maximum positions
('Positions') and matching cell array of names ('Names') for
your particular types of signals. The related function idpeaktable.m
does the same thing for a peak table P returned by any of my peak finder
or peak fitting functions, having one row for each peak and
columns for peak number, position, and height as the first three
columns. The syntax is [IdentifiedPeaks] =
idpeaktable(P,maxerror,Positions,Names). The interactive
iPeak function described in the next
section has this function built inas one of the
keystroke commands.
Example: Download idpeaks.zip, extract it, and place the
extracted files in the Matlab or Octave path. This
contains a high-resolution atomic emission spectrum of copper
('spectrum') and a data table of known Cu I and II atomic lines
('DataTable') containing their positions and names.
iPeak is a
keyboard-operated
interactive peak finder for time series data, based on the "findpeaksG.m" and"findpeaksL.m"
functions, for Matlab only.It
accepts data in a single vector, a pair of vectors, or a matrix
with the independent variable in the first column and the
dependent variable in the second column:
Example 1:One input argument; data in single vector >> y=cos(.1:.1:100);ipeak(y)
Example 2: One input argument; data in two columns of a matrix >> x=[0:.01:5]';y=x.*sin(x.^2).^2;M=[x y];ipeak(M)
Example 3: Two input
arguments; data in separate x and y vectors >>
x=[0:.1:100];y=(x.*sin(x)).^2;ipeak(x,y);
Example 4: When you start iPeak using the simple
syntax above, if it starts off by picking up far too many
or too few peaks, you can add an additional number (after the data) to control peak
sensitivity.
>>
x=[0:.1:100];y=5+5.*cos(x)+randn(size(x));ipeak(x,y,10); or
>> ipeak([x;y],10); or
>> ipeak(humps(0:.01:2),3)
or >>
x=[0:.1:10];y=exp(-(x-5).^2);ipeak([x' y'],1)
The additional numeric argument is an estimate of maximum
peak density(PeakD), the ratio
of the typical peak width to the length of the entire data
record. Small values detect fewer peaks; larger values detect more peaks. It effects only the
starting values for
the peak detection parameters. (It's just a quick way to set
reasonable initial values of the peak detection parameters, so you won't have so much adjusting
to do).
Peaks in annual sunspot numbers from 1700 to 2009
(download the datafile).
Sunspot data downloaded
from NOAA,
iPeak displays the
entire signal in the lower half of the Figure window and an
adjustable zoomed-in section in the upper window. Pan and zoom
the portion in the upper window using the cursor arrow
keys. The
peak closest to the center of the upper window is labeled in the
upper left of the top window, and it peak position, height, and
width are listed. The Spacebar/Tab keysjump to the next/previous detected
peak and displays it in the upper window at the current zoom
setting (use
the up and down cursor arrow keys to adjust the zoom range). Or you can pressthe J key to jump to a specified peak
number.
Adjust the peak detection
parameters AmpThreshold (A/Z keys), SlopeThreshold (S/X), SmoothWidth
(D/C), FitWidth (F/V) so that it detects the
desired peaks and ignores those that are too small, too broad,
or too narrow to be of interest. You can also type in a specific value of
AmpThreshold by pressing Shift-A or a specific value of
SlopeThreshold by pressing Shift-S. Detected peaks are numbered from
left to right.
Press P to display the peak table of all the detected
peaks (Peak #, Position, Height, Width, Area, and percent
fitting error): Gaussian shape mode (press Shift-G
to change) Window span: 169 units Linear baseline subtraction
Peak# Position
Height
Width
Area Error
1
500.93
6.0585
34.446 222.17
9.5731
2
767.75
1.8841
105.58
211.77 25.979
3
1012.8
0.20158
35.914
7.7 269.21
............. Press Shift-G to switch between Gaussian and
Lorentzian shape modes. Press Shift-P to save peak table
as disc file. Press U to switch between peak and valley
mode. Don't forget
that only valleys above (that is,
more positive or less negative than) the AmpThreshold are
detected; if you wish to detect valleys that have negative
minima, then AmpThreshold must be set more negative than
that. Note: to speed up the operation for
signals over 100,000 points in length, the lower window is
refreshed only when the number of detected peaks changes or if
the Enter key is pressed. Press K to see all
the keystroke commands.
Press U key to switch
between peak and valley mode.
If the density of data points on the peaks is too low
- less than about 4 points - the peaks may not be reliably
detected; you can improve reliability by using the
interpolation command (Shift-I) to re-sample the data
by linear interpolation to a larger number of
points. Conversely, if the density of data
points on the peaks of interest is very high - say, more than
100 points per peak - then you can speed up the operation of
iPeak by re-sampling to a smaller number
of points.
Peak
Summary Statistics.The E key prints a table
of summary statistics of the peak intervals (the x-axis
interval between adjacent detected peaks), heights, widths,
and areas, listing the maximum, minimum, average,
and percent standard deviation, and displaying the histogramsof the peak intervals, heights,
widths, and areas in figure window 2. Peak
Summary Statistics
149 peaks detected
No baseline correction
Interval
Height
Width
Area
Maximum
1.3204
232.7724
0.33408 80.7861
Minimum
1.1225
208.0581
0.27146 61.6991
Mean
1.2111
223.3685
0.31313 74.4764
% STD
2.8931
1.9115
3.0915 4.0858
Example
5: Six
input arguments. As above, but input arguments 3 to 6 directly
specifies initial values of AmpThreshold (AmpT), SlopeThreshold (SlopeT), SmoothWidth (SmoothW), FitWidth (FitW). PeakD is ignored in this case, so
just type a '0' as the second argument after the data matrix). >>
ipeak(datamatrix,0,.5,.0001,20,20);
Pressing 'L' toggles ON and OFF the peak labels in the
upper window.
Keystrokes allow you to pan
and zoom the upper window, to inspect each peak in detail if
desired. You
can set the initial values of pan and zoom in optional input
arguments 7
('xcenter') and
8 ('xrange'). See example 6 below.
The Y keytoggles between linear and log
y-axis scale in the lower window (a log axis is good for
inspecting signals with high dynamic range; it effects only the
lower window display and has no effect on the peak detection or
measurements).
Log scale (Y key) and background correction
mode (T key)
Example 6: Eight input arguments. As above, but input arguments 7 and 8 specify the initial pan and zoom settings, 'xcenter' and 'xrange', respectively. In this example, the x-axis data
are wavelengths in nanometers (nm), and the upper window zooms in on a very
small0.4 nm region centered on 249.7 nm.
(These data,
provided in the ZIP file, are from a
high-resolution atomic spectrum).
>> load
ipeakdata.mat >> ipeak(Sample1,0,100,0.05,3,4,249.7,0.4); Baseline correction modes. The T key cycles thebaseline correction mode from off, to linear, to quadratic,
to flat mode, then back to off. The current mode is displayed above the upper panel. When the
baseline correction mode
is OFF, peak heights are measured relative to zero. (Use this
mode when the
baseline is zero or if you have previously subtracted the baseline
from the entire signal using the Bkey). In thelinear orquadraticmodes,
peak heights are automatically measured relative to the local
baseline interpolated from the points at the ends of the segment displayed in
the upper panel; use
the zoom controls to isolate a group of peaks
so that the
signal returns to the local baseline at the beginning and end of the segment displayed in the upper window. The peak heights, widths, and areas in the peak table (R or P keys)will be automatically corrected for the baseline.Thelinear orquadraticmodes will
work best if the
peaks are well separated so that the signal returns to the local
baseline between the peaks. (If the peaks are highly overlapped, or if they are not
Gaussian in shape, the best results will be obtained by using
the curve fitting function - the N or M keys. The flat mode is used only for curve fitting
function, to account
for a flat baseline offset without reference to the edges
of the signal segment being fit).
Example 7: Nine input arguments. As example 6, but the 9^{th}
input argument sets
the background correction mode (equivalent
to pressing the T key)' 0=OFF; 1=linear; 2=quadratic,
3=flat. If not specified, itis initially OFF. >> ipeak(Sample1,0,100,0.00,3,4,249.7,0.4,1);
Converting to command-line functions. To aid in
writing your own scripts and function to automate processing, the
'Q' key prints out the findpeaksG, findpeaksb, and
findpeaksfit commands for the segment of the signal in the upper
window and for the entire signal, with most or all of the input
arguments in place, so you can Copy and Paste into your own
scripts. The 'W' key similarly prints out the peakfit and
ipf commands.
Shift-Ctrl-S transfers the current signal to iSignal.m
and Shift-Ctrl-P transfers the current signal to
Interactive Peak Detector (iPeak.m), if those functions
are installed in your Matlab path.
Ensemble averaging.
For signals that contain repetitive waveform patterns occurring in
one continuous signal, with nominally the same shape except for
noise, the ensemble averaging function (Shift-E) can
compute the average of all the repeating waveforms. It works by
detecting a single peak in each repeat waveform in order to
synchronize the repeats (and therefore does not require that the
repeats be equally spaced or synchronized to an external reference
signal). To use this function, first adjust the peak detection
controls to detect only one peak in each repeat pattern,
then zoom in to isolate any one of those repeat patterns, and then
press Shift-E. The average waveform is displayed in Figure
2 and saved as EnsembleAverage.mat in the current directory.See iPeakEnsembleAverageDemo.m. Normal and Multiple Peak
fitting: The N key applies iterative
curve fitting to the detected
peaks that are displayed in the upper window (referred
to here as "Normal" curve fitting). The use of the iterative
least-squares function can result in more accurate peak
parameter measurements than the normal peak table (R or P keys), especially if the peaks are
non-Gaussian in shape or are highly overlapped. (If the peaks are superimposed on a
background, select
the baseline
correction mode using the
T key, then use the pan and zoom keys to
select a peak or a group of overlapping peaks in the upper
window, with the signal returning all the way to the local
baseline at the ends of the upper window if you are using the
linear or quadratic baseline modes). Make sure that the AmpThreshold, SlopeThreshold, SmoothWidth are
adjusted so that each peak is numbered once. Only numbered peaks are fit.Then press the N
key, type a number for the desired peak shape from the menu displayed in the Command window:
Gaussian: y=exp(-((x-pos)./(0.6005615.*width))
.^2) Gaussians with independent positions and
widths : 1 (default) Exponentional-broadened Gaussian (equal time
constants): b Exponentional-broadened equal-width Gaussian :
8 Exponentional-broadened Gaussian (independent
time constants): 31 Gaussians with the same widths : 6 Gaussians with preset fixed widths : 11 Fixed-position Gaussians : 16 Asymmetrical Gaussians with unequal
half-widths on both sides : 14 Lorentzian: y=ones(size(x))./(1+((x-pos)./(0.5.*width)).^2) Lorentzians with independent positions and
widths : 2 Exponentional-broadened Lorentzian : 18 Equal-width Lorentzians : 7 Fixed-width Lorentzian : 12 Fixed-position Lorentzian : 17 Asymmetrical Lorentzians with unequal
half-widths on both sides : 15 Blended sum of Gaussian and Lorentzian functions (equal
blends): 13 Blended sum of Gaussian and Lorentzian functions
(independent blends): 33 Voigt profile (equal alphas): 20 Voigt profile (independent alphas): 30 Logistic: n=exp(-((x-pos)/(.477.*wid)).^2); y=(2.*n)./(1+n)
: 3 Pearson:
y=ones(size(x))./(1+((x-pos)./((0.5.^(2/m)).*wid)).^2).^m : 4 Pearson with independent shape factors, m : 32 Exponential pulse: y=(x-tau2)./tau1.*exp(1-(x-tau2)./tau1)
: 9 Alpha function: y=(x-spoint)./pos.*exp(1-(x-spoint)./pos);
: 19 Sigmoid: y=1/2 + 1/2* erf((x-tau1)/sqrt(2*tau2)) : 10 Lognormal: 25
and press Enter, then type in a number of repeat
trial fits and
press Enter(the default is 1; start with that and then increase if
necessary). If you have selected a variable-shape peak
(numbers 4, 5, 8 ,13, 14, 15, 18, 20, 30-33), the program will
ask you to type in a number that fine-tunes the shape . The program will then
perform the fit, display the results graphically in Figure
window 2, and print out a table of results in the command
window, e.g.:
Peak shape (1-8): 2 Number of trials: 1
Least-squares fit to
Lorentzian peak model Fitting Error 1.1581e-006% Peak#
Position Height Width
Area
1
100
1 50
71.652
2
350
1 100 146.13
3
700
1 200 267.77
Normal
Peak Fit (N key) applied to a group of six overlapping
Gaussians peaks
There is also a "Multiple" peak fit function (M
key) that will attempt to apply iterative curve fitting to allthe detected peaks in the signal
simultaneously. Before using this function, it's best to turn off the
automatic baseline
correction (T
key) and use the
multi-segment baseline correction function (B
key) to remove the background (because the baseline correction function will probably not be able
to subtract the baseline from the entire signal). Then press M and proceed as
for the normal curve fit. A multiple curve fit may take a minute
or so to complete if the number of peaks is large, possibly
longer than the Normal curve fitting function on
each group of peaks separately.
The N and M key fitting functions perform non-linear iterative curve fitting
using the peakfit.m
function. The number of peaks and the starting values of peak
positions and widths for the curve fit function are
automatically supplied by the the findpeaksG function, so it is
essential that the peak detection variables in iPeak be adjust
so that all the peaks in the selected region are detected and
numbered once. (For more flexible curve fitting, use ipf.m,
which allows manual optimization of peak groupings and start
positions).
Example 8. This example generates four Gaussian
peaks, all with the exact same peak height (1.00) and area
(1.773). The first peak (at x=4) is isolated, the second peak
(x=9) is slightly overlapped with the third one, and the last two
peaks (at x= 13 and 15) are strongly overlapped.
x=[0:.01:20]; y=exp(-(x-4).^2)+exp(-(x-9).^2)+exp(-(x-13).^2)+exp(-(x-15).^2); ipeak(x,y)
By itself, iPeak does a fairly good
just of measuring peaks positions and heights by fitting just
the top part of the peaks, because the peaks are Gaussian, but
the areas and
widths of the last two
peaks (which should be 1.665 like the others) are quite a bit too large because of the overlap: Peak# Position
Height Width Area 1
4
1
1.6651 1.7727 2
9
1
1.6651 1.7727 3
13.049 1.02
1.8381 1.9956 4
14.951 1.02
1.8381 1.9956
In this case, curve fitting (using the N or M
keys) does a much better job, even if the overlap is even greater, but only if
the peak shape is known: Peak#
Position Height
Width Area
1
4
1
1.6651 1.7724
2
9
1
1.6651 1.7725 3
13
1
1.6651 1.7725 4
15 0.99999
1.6651 1.7724
Note 1: If the peaks are too overlapped to be detected
and numbered separately, try pressing the H key to
activate the sharpen function before pressing M
(version 4.0 and above only).This does not effect the signal
itself, only the peak detection. Note 2: If you plan to use a variable-shape
peak (numbers 4, 5, 8 ,13, 14, 15,
18, or 20) for the Multiple
peak fit, it's a good idea to obtain a reasonable value for the
requested "extra" shape parameter by performing a Normal
peak fit on an isolated single peak (or small group of
partly-overlapping peaks) of the same shape, then use
that value for the Multiple curve fit of the entire
signal. Note 3: Although the peak widths can vary from peak to peak, the curve
fit routines assume that the peak shape is the same for all peaks in one
fitting operation, so if the peak shape varies across the
signal, use the Normal peak fit to fit each section with
a different shape rather than the Multiple peak fit. Peak
identification. There is an optional "peak
identification" function if optional input arguments 9 ('MaxError'), 10 ('Positions'), and 11 ('Names') are included. The "I" key
toggles this function ON and OFF. This function compares the
found peak positions (maximum x-values) to a reference database
of known peaks, in the form of an array of known peak maximum
positions ('Positions') and matching cell array of names
('Names'). If the position of a found peak in the signal is
closer to one of the known peaks by less than the specified
maximum error ('MaxError'), then that peak is considered a match
and its name is displayed next to the peak in the upper window.
When when the 'O' key is pressed (the letter 'O'), the
peak positions, names, errors, and amplitudes are printed out in
a table in the command window.
Example
9: Eleven input arguments. As above, but also specifies
'MaxError', 'Positions', and 'Names' in optional input
arguments 9, 10, and 11, for peak identification function.
Pressing the 'I' key toggles off and on the peak
identification labels in the upper window.These data (provided in the ZIP file) are from a high-resolution atomic
spectrum (x-axis in nanometers).
The peak identification function applied to a high-resolution
atomic spectrum.
Three peaks near 296 nm isolated and identified. Press
the I key to display the peak ID names.
Pressing "O" prints the peak positions, names,
errors, and amplitudes in a table in the command window.
Name
Position Error
Amplitude 'Mg I 295.2'
[295.2]
[0.058545] [129.27] 'Cu
296.1 nm' [296.1]
[0.045368] [124.6]
'Hg 297.6 nm'
[297.6] [0.023142]
[143.95]
Here is another example, from a
large atomic emission spectrum with over 10,000 data points and
many hundreds of peaks. The reference table of known peaks in
this case is taken from Table 1 of ASTM
C1301 - 95(2009)e1. With the settings I was using, 10
peaks were identified, shown in the table below. You can see
that some of these elements have more than one line identified.
Obviously, the lower the settings of the AmpThreshold,
SlopeThreshold, and SmoothWidth, the more peaks will be
detected; and the higher the setting of "MaxError", the more
peaks will be close enough to be considered identified. In this
example, the element names in the table below are hot-linked to
the screen image of the corresponding peak detected as
identified by iPeak. Some of these lines, especially Nickel
231.66nm, Silicon 288.18nm, and Iron 260.1nm, are rather weak
and have poor signal-to-noise ratios, so their identification
might be in doubt (especially Iron, since its wavelength error
is greater than the rest). It's up to the experimenter to decide
which peaks are strong enough to be significant. In this
example, I used an independently published table of element
wavelengths, rather than data acquired on that same instrument,
which depends on the accurate wavelength calibration of the
instrument; in fact, the wavelength calibration is likely
excellent, based
on the very small error for the two well-known and relatively
strong Sodium lines at 589 and 589.59 nm. (Even so, I set
MaxError to 0.2 nm in this example to loosen up the wavelength
matching requirements).
Note: The
ZIP file contains the latest version of the iPeak function as
well as some sample data to demonstrate peak identification
(Example 8). Obviously for your own applications, it's
up to you to provide your own array of known peak maximum
positions ('Positions') and matching cell array of names
('Names') for your particular types of signals.
Keyboard Controls (version
7): Pan
signal left and right...Coarse pan: < or >
Fine pan:
left or right cursor arrow keys
Nudge
one point
left or right: [ and ] Zoom
in and out.............Coarse zoom: / or '
Fine
zoom: up or down cursor arrow keys
Resets pan and zoom.........ESC Select entire signal........Ctrl-A Refresh entire plot.........Enter (Updates
cursor position in lower plot)
Change plot color...........Shift-C (cycles
through standard colors)
Adjust AmpThreshold.........A,Z (Larger
values ignore short peaks) Type
in AmpThreshold........Shift-A (Type value and press
Enter)
Adjust SlopeThreshold.......S,X (Larger
values ignore broad peaks) Type
in SlopeThreshold......Shift-S(Type value and press Enter) Adjust SmoothWidth..........D,C
(Larger values ignore sharp peaks}
Adjust FitWidth.............F,V (Adjust to
cover just top part of peaks)
Toggle sharpen mode ........H Helps detect
overlapped peaks.
Baseline correction.........B, then click
baseline at multiple points
Restore original signal.....G to cancel
previous background subtraction
Invert signal...............- Invert
(negate) the signal (flip + and -)
Set minimum to zero.........0 (Zero) Sets minimum
signal to zero Interpolate signal..........Shift-I
Interpolate (re-sample) to N points
Toggle log y mode OFF/ON....YPlot log Y axis on lower graph
Cycles baseline modes.......T 0=none; 1=linear;
2=quadratic; 3=Flat. Toggle valley
mode OFF/ON...U Switch to valley mode
Gaussian/Lorentzian switch..Shift-G Switch
between Gaussian and Lorentzian mode
Print peak table............P Prints Peak
#, Position, Height, Width
Save peak table.............Shift-P Saves
peak table as disc file
Step through peaks..........Space/Tab Jumps
to next/previous peak Jump to peak number.........J Type peak number and press Enter
Normal peak fit.............N Fit peaks in
upper window with peakfit.m
Multiple peak fit...........M Fit all peaks
in signal with peakfit.m Ensemble
average all peaks..Shift-E (Read instructions first)
Print keyboard commands.....K Prints this
list
Print findpeaks arguments...Q Prints
findpeaks function with arguments.
Print ipeak arguments.......W Prints ipeak
function with all arguments.
Print report................R Prints Peak
table and parameters
Print peak statistics.......E prints mean,
std of peak intervals, heights, etc.
Peak labels ON/OFF......... L Label all
peaks detected in upper window.
Peak ID ON/OFF..............I Identifies
closest peaks in 'Names' database.
Print peak IDs..............O Prints table
of peaks IDs Switch to
ipf.m.............Shift-Ctrl-FTransfer current signal
to
Interactive Peak Fitter Switch to
iSignal...........Shift-Ctrl-S Transfer current
signal
to iSignal.m
demoipeak.m is a
simple demo function that generates a noisy signal
with peaks, calls iPeak, and then prints out a table
of the actual peak parameters and a list of the
peaks detected and measured by iPeak for comparison.
Before running this demo, ipeak.m
must be downloaded and placed in the Matlab path.
The ZIP file at http://terpconnect.umd.edu/~toh/spectrum/ipeak7.zip contains several demo
functions (ipeakdemo.m, ipeakdemo1.m, etc) that
illustrate various aspects of the iPeak function and
how it can be used effectively. Download the zip
file, right-click and select "Extract all", then put
the resulting files in the Matlab path and run them
by typing their names at the Matlab command window
prompt. To test for the proper installation and
operation of iPeak, run testipeak.
ipeakdemo:
effect of the peak detection parameters
Four Gaussian peaks with the same heights but
different widths (10, 30, 50 and 70 units) This
demonstrates the effect of SlopeThreshold and SmoothWidth
on peak detection. Increasing SlopeThreshold (S key) will
discriminate against the broader peaks. Increasing
SmoothWidth (D key) will discriminate against the narrower
peaks and noise. FitWidth (F/V keys) controls the number
of points around the "top part" of the (unsmoothed) peak
that are taken to estimate the peak heights, positions,
and widths. A reasonable value is ordinarily about equal
to 1/2 of the number of data points in the half-width of
the peaks. In this case, where the peak widths are quite
different, set it to about
1/2 of the number of data points in the narrowest
peak.
Demonstration
of background correction, for separated, narrow peaks on
a large baseline. Each time you run this demo, you will
get a different set of peaks and noise. A table of the
actual peak positions, heights, widths, and areas is
printed out in the command window. Jump to the
next/previous peaks using the Spacebar/Tab keys.
Hint: Set the
linear
baseline correction mode (T key),
adjust the zoom setting so that the peaks are shown
one at a time in the upper window, then press the P
key to display the peak table.
ipeakdemo2:
peak overlap and the curve fitting functions.
Demonstration
of error caused by overlapping peaks on a large offset
baseline. Each time you run this demo, you will get a
different set of peaks and noise. A table of the
actual peak positions, heights, widths, and areas is
printed out in the command window. (Jump to the
next/previous peaks using the Spacebar/Tab
keys).
Hint: Use
the B key and click on the baseline points, then
press the P key to display the peak table. Or
turn on the background correction mode (T
key) and use the Normal curve fit (N key) with peak shape
1 (Gaussian).
ipeakdemo3: Baseline shift caused by
overlapping peaks
Demonstration
of overlapping Lorentzian peaks, without an added
background. A table of the actual peak
positions, heights, widths, and areas is printed out in
the command window; in this example, the true peak heights
are 1,2 3,...10. Overlap of peaks can cause significant
errors in measuring peak parameters, especially for
Lorentzian peaks, because they have gently sloping sides
that contribute to the baseline of any peaks in the
region.
Hint: turn
OFF the background
correction mode (T key) and use the
Normal curve fit (N key) to fit small groups of
2-5 peaks numbered in the upper window, with peak shape 2
(Lorentzian). For the greatest
accuracy in measuring a particular peak, include one
or two additional peaks on either side, to help
account for the baseline shift caused by the sides of
those neighboring
peaks. Alternatively, if the total number of
peaks is not too great, you can use the Multiple curve
fit (M key) to fit the entire signal in the lower
window.
Detection
and measurement of four peaks in a very noisy signal.
The signal-to-noise ratio of first peak is 2. Each time you
run this demo, you will get a different set of noise. A table of the
actual peak positions, heights, widths, and areas is
printed out in the command window. Jump to the
next/previous peaks using the Spacebar/Tab keys.
The peak at
x=100 is usually detected, but the accuracy of peak
parameter measurement is poor because of the low
signal-to-noise ratio. ipeakdemo6 is similar
but has the option of different kinds on noise
(white, pink, proportional, etc).
Hint: With
very noisy signals it is usually best to increase SmoothWidth
and FitWidth to help reduce the effect
of the noise.
In this demo
the peaks are so highly overlapped that only one or
two of the highest peaks yield distinct maxima that
are detected by iPeak. The height, width, and area
estimates are highly inaccurate because of this
overlap. The normal peak fit function ('N' key) would be
useful in this case, but it depends on iPeak for the
number of peaks and for the initial guesses, and so
it would fit only the peaks that were found and
numbered.
To help in
this case, pressing the 'H' key will activate the
peak sharpen function that decreases peak width and
increases peak height of all the peaks, making it easier
to detect and number all the peaks for use by the
peakfit function (N
key). Note: peakfit fits the original unmodified peaks; the
sharpening is used only to help locate the peaks to
provide peakfit with suitable starting values..
Which to use: iPeak, ipf, or peakfit? Download these
Matlab demos that compare iPeak.m with ipf.m and peakfit.m
for signals with a few peaks and signals
with many peaks and that shows how to
adjust iPeak to detect broad or narrow peaks.
These are self-contained demos that include all required Matlab
functions. Just place them in your path and click Run or type their name at
the command prompt. Or you can download all these demos together
in idemos.zip. For peak fitting, the N-key
and M-key functions of iPeak have the advantage of using
the peak positions and widths determined by the automatic peak
finder routine as the first-guess vector for peakfit.m, often
resulting in faster and more robust fits that ipf.m or peakfit.m
themselves with default first guesses.
Try fitting the x,y data in ipeakexampledata.mat
for some examples of using
iPeak for fitting overlapping peaks. On the
other hand, ipf.m is better for fitting peaks that don't make
distinct maxima and thus are not detected detected by
the automatic peak finder routine.
Peak Finding and
Measurement Spreadsheets
Simple peak detection. The spreadsheet
pictured above (PeakDetection.xls
and PeakDetection.xlsx)
implements the simple derivative zero-crossing peak detection
method described above. The input x,y data are contained in
Sheet1, column A and B, rows 9 to 264. (You can type or paste
your own data there). The amplitude threshold and
slope threshold are set in cells B4 and E4,
respectively. Smoothing and
differentiation are performed by the convolution technique used
by the spreadsheets DerivativeSmoothing.xls described previously.
The Smooth Width and the Fit Width are both controlled by the
number of non-zero convolution coefficients in row 6,
columns J through Z. (In order to compute a symmetrical first
derivative, the coefficients in columns J to Q must be the
negatives of the positive coefficients in columns S to Z).
The original data and the smoothed derivative are shown in the
two charts. To detect peaks in the data, a series of three
conditions are tested for each data point in columns F, G, and
H, corresponding to the three nested loops in findpeaksG.m:
Is the signal greater than Amplitude
Threshold? (line 45 of findpeaksG.m; column F in the
spreadsheet)
Is there a downward directed zero crossing
in the smoothed first derivative? (line 43 of findpeaksG.m;
column G in the spreadsheet)
Is the slope of the derivative at that point
greater than the Slope Threshold? (line 44 of findpeaksG.m;
column H in the spreadsheet)
If the answer to all three questions is yes (highlighted by blue
cell coloring), a peak is registered at that point (column I),
counted in column J, and assigned an index number in column K.
The X-axis position of the peak is determined for each detected
peak and placed in the table in columns AC and AD. See PeakDetectionExample.xlsx
(or .xls) for an example
with data. (You can easily extend
the spreadsheet to longer columns of data by dragging down the
last row of columns A through K as needed, and you can extend
the spreadsheet to greater number of peaks by dragging down the
last row of columns AC and AD as needed and
modifying cell R7 to include the additional peaks).
Peak detection with
least-squares measurement. An extension of that method
is made in PeakDetectionAndMeasurement.xlsx
(screen image),
which makes the assumption that the peaks are Gaussian and
measures their height, position, and width more precisely
using a least-squares technique, just like "findpeaksG.m".
For the first 10 peaks found, the x,y original unsmoothed
data are copied to Sheets 2 through 11, respectively, where that
segment of data is subjected to a Gaussian least-squares fit,
using the same technique as GaussianLeastSquares.xls.
The best-fit Gaussian parameter results are copied back to
Sheet1, in the table in columns AH-AK. (In its present
form. the spreadsheet is limited to measuring 10 peaks, although
it can detect any number of peaks. Also it is limited in
Smooth Width and Fit Width by the 17-point convolution
coefficients).
The spreadsheet is available in OpenOffice (ods) and in Excel (xls) and (xlsx)
formats. They are functionally equivalent and differ only
in minor cosmetic aspects. There are two example spreadsheets (A, and B) with calculated noisy
waveforms that you can modify. Note: To enter data into
the .xls and .xlsx versions, click the "Enable Editing" button
in the yellow bar at the top.
Expanding the PeakDetectionAndMeasurement.xlsx spreadsheet to
larger numbers of measured peaks is more difficult. You'll have to
drag down row 17, columns AC through AK, and adjust the formulas
in those rows for the required number of additional peaks, then
copy all of Sheet11 and paste it into a series of new sheets
(Sheet12, Sheet13, etc), one for each additional peak, and finally
adjust the formulas in columns B and C in each of these additional
sheets to refer to the appropriate row in Sheet1. Modify these
additional equations in the same pattern as those for peaks 1-10.
Spreadsheet vs Matlab/Octave. A comparison
of this spreadsheet to its Matlab/Octave equivalent findpeaksplot.m is instructive. On
the positive side, the spreadsheet literally "spreads out" the
data and the calculations spatially over a large number of cells
and sheets, breaking down the discrete steps in a very graphic
way. In particular, the use of conditional
formatting in columns F through K make the peak detection
decision process more evident for each peak, and the
least-squares sheets 2 through 11 lay out every detail of those
calculations. Spreadsheet programs have many flexible and
easy-to-use formatting options to make displays more attractive.
On the down side, a spreadsheet as complicated as PeakDetectionAndMeasurement.xlsx
is far more difficult to construct than its Matlab/Octave
equivalent. Much more serious, the spreadsheet is less
flexible and harder to expand to larger signals and larger
number of peaks. In contrast, the Matlab/Octave equivalent,
while requiring some understanding of programming to create
initially, is faster to use, much more flexible, and can easily
handle signals and smooth/fit widths of any size. Moreover,
because it is written as a Matlab/Octave function, it can be readily
employed as an element in your own custom Matlab/Octave
programs to perform even larger tasks. It's a little harder to
do that sort of thing in a spreadsheet.
To compare the computation speed of this
spreadsheet peak finder to the Matlab/Octave equivalent, we can
take as an example the spreadsheet PeakDetectionExample2.xls,
or PeakDetectionExample2.ods,
which computes and plots a test signal consisting of a noisy
sine-squared waveform and then detects and measures 10 peaks in
that waveform and displays a table of peak parameters.
This is equivalent to the Matlab/Octave script:
This page is part of "A
Pragmatic Introduction to Signal Processing", created
and maintained by Prof.
Tom O'Haver , Department of Chemistry and Biochemistry, The
University of Maryland at College Park. Comments, suggestions and
questions should be directed to Prof. O'Haver at toh@umd.edu.
Number of unique visits since May 17, 2008: