AppendixA

B. Signal or Noise?

This experimental signal in this case study was unusual because it did not look like a typical signal when plotted; in fact, it looked a lot like noise at first glance. The figure below compares the raw signal (bottom) with the same number of points of normally-distributed white noise (top) with a mean of zero and a standard deviation of 1.0 (obtained from the Matlab/Octave 'randn' function).

As you can see, the main difference is that the signal has more large 'spikes', especially in the positive direction. This difference is evident when you look at the descriptive statistics of the signal and the randn function:

DESCRIPTIVE STATISTICS	Raw signal	random noise (randn function)
Mean	0.4	0
Maximum	38	about 5 - 6
Standard Deviation (STD)	1.05	1.0
Inter-Quartile Range (IQR)	1.04	1.3489
Kurtosis	38	3
Skewness	1.64	0

You can see that the standard deviations of these two are nearly the same, but the other statistics (especially the kurtosis and skewness) indicate that the probability distribution of the signal is far from normal; there are far more positive spikes in the signal than expected for pure noise. Most of these turned out to be the peaks of interest for this signal; they look like spikes only because the length of the signal (over 1,000,000 points) causes the peaks to be compressed into one screen pixel or less when the entire signal is plotted on the screen. In the figures on the left, iSignal is used to "zoom in" on some of the larger of these peaks (using the cursor arrow keys). The peaks are very sparsely separated (by an average of 1000 half-widths between peaks) and are well above the level of background noise (which has a standard deviation of roughly 0.9 throughout the signal).

The researcher who obtained this signal said that a 'good' peak was 'bell shaped', with an amplitude above 5 and a width of 500-1000 x-axis units. So that means that we can expect the signal-to-background-noise ratio to be at least 5/0.9 = 5.5. You can see in the three example peaks on the left that the peak widths do indeed meet those expectations. The interval between adjacent x-axis points is 25, so that means the we can expect the peaks to have about 20 to 40 points in their widths. Based on that, we can expect that the positions, heights and widths of the peaks should be able to be be measured fairly accurately using least-squares methods (which reduce the uncertainty of measured parameters by about the square root of the number of points used - about a factor of 5 in this case). However, the noise appears to be signal-dependent; the noise on the top of the peaks is distinctly greater than the noise on the baseline. The result is that the actual signal-to-noise (S/N) ratio of peak parameter measurement for the larger peaks will not be as good as might be expected based on the ratio of the peak height to the noise on the background. Most likely, the total noise in this signal is the sum of two major components, one with a fixed standard deviation of 0.9 and the other roughly equal to 10% of the peak height.

To automate the detection of large numbers of peaks, we can use the findpeaksG or iPeak functions. Reasonable values of the input arguments AmplitudeThreshold, SlopeThreshold, SmoothWidth, and FitWidth for those functions can be estimated based on the expected peak height (5) and width (20 to 40 data points) of the "good" peaks. For example, using AmplitudeThreshold=5, SlopeThreshold=.001, SmoothWidth=25, and FitWidth=25, these function detect and measure 76 peaks above an amplitude of 5 and with an average peak width of 523. The interactive iPeak function is especially convenient for exploring the effect of these peak detection parameters and for graphically inspecting the peaks that it finds. Ideally the objective is to find a set of peak detection arguments that detect and accurately measure all the peaks that you would consider 'good' and skip all the 'bad' ones. But in reality the criteria for good and bad peaks is at least partly subjective, so it's usually best to err on the side of caution and avoid skipping 'good' peaks at the risk of including a few 'bad' peaks in the mix, which can be weeded out manually based on unusual position, height, width, or appearance.

Of course, it must be expected that the values of the peak position, height, and width given by the findpeaksG or iPeak functions will only be approximate and will vary depending on the exact setting of the peak detection arguments; the noisier the data, the greater the uncertainty in the peak parameters. In this regard the peak-fitting functions peakfit.m and ipf.m usually give more accurate results, because they make use of all the data across the peak, not just the top of the peak as do findpeaksG and iPeak. For example, compare the results of the peak near x=3035200 measured with iPeak (click to view) and with peakfit (click to view). Also, the peak fitting functions are better for dealing with overlapping peaks and for estimating the uncertainty of the measured peak parameters, using the bootstrap options of those functions. For example, the largest peak in this signal has an x-axis position of 2.8683e+007, height of 32, and width of 500; the bootstrap method determines that the standard deviations are 4, 0.92, and 9.3, respectively.

Because the signal in the case study was so large (over 1,000,000 points), the interactive programs such as iPeak, iSignal, and ipf may be sluggish in operation, especially if your computer is not fast computationally or graphically. If this is a serious problem, it may be best to break the signal up into two or more segments and deal with each segment separately, then combine the results. Alternatively, you can use the condense function to average the entire signal into a smaller number of points by a factor of 2 or 3 (at the risk of slightly reducing peak heights and increasing peak widths), but then you should reduce SmoothWidth and FitWidth by the same factor to compensate for the reduced number of data points across the peaks. Run testcondense.m for a demonstration of the condense function.

This page is part of "A Pragmatic Introduction to Signal Processing", created and maintained by Tom O'Haver, Professor Emeritus, Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu. Updated July, 2022.