The interfacing of measurement instrumentation to small computers
has now become standard practice in the modern science laboratory.
Computers are used for data acquisition, data, and storage, using
a large number of digital computer-based numerical methods.
Techniques are available that can transform signals into more
useful forms, detect and measure peaks, reduce noise, improve the
resolution of overlapping peaks, compensate for instrumental
artifacts, test hypotheses, optimize measurement strategies,
diagnose measurement difficulties, and decompose complex signals
into their component parts. These techniques can often make
difficult measurements easier by extracting more information from
the available data. Many of these techniques are based on
laborious mathematical procedures and/or analog electronics that
were not really practical before the advent of computerized
instrumentation.
It is important to appreciate the abilities, as well as the
limitations, of these techniques. In recent decades,
computers and digital storage and processing has become
commonplace, much more accurate, far less costly, easier to
program, and literally millions of times more capable
altogether, reducing the cost of raw data and making complex
computer-based signal processing techniques both more practical and
necessary. Computations that were previously impractical are now
common, and approximations and shortcuts that were once
necessitated by mathematical convenience are no longer needed. But
it's not just the growth of computer power: there are now new
materials, new instruments, new fabrication techniques, new
automation capabilities. Sensors are now smaller and cheaper and
faster than ever before; we can measure over a wider range of
speeds, temperatures, pressures, and locations. There are new kinds
of data that we never had before. As Erik Brynjolfsson and
Andrew McAfee wrote in The Second Machine Age (W. W.
Norton, 2014): "...many types of raw data are getting dramatically
cheaper, and as data get cheaper, the bottleneck increasingly is
the ability to interpret and use data". Kate
Keahey, a Senior Scientist at Argonne National Laboratory,
involved with gravitational wave research, has said that "Software
is a vital part of the research landscape, and most researchers
will benefit from understanding its possibilities, limitations and
the requirements for building it".
This essay covers only topics related to
one-dimensional time-series signals, not two-dimensional data
such as images. It uses a pragmatic approach and is limited to
mathematics only up to the most elementary aspects of calculus,
statistics, and matrix math. I use logical arguments, analogies,
graphics, and animation to explain ideas, rather than lots of
formal mathematics. Data processing without math? Not really!
Math is essential, just as it is for the technology of
cell phones, GPS, digital photography, the Web, and computer
games. But you can get started using these tools
without understanding all the underlying math and software
details. Seeing it work makes it more likely that you'll want to
understand how it works. Still, in the long run, it's
not enough just to know how to operate the software, any more
than knowing how to use a word processor or a MIDI sequencer
makes you a good author or musician.
Why do I
title this document "signal processing" rather than "data
processing"? By "signal" I mean the continuous x,y
numerical data recorded by scientific instruments as time-series,
where x may be time or another quantity like energy
or wavelength, as in the various forms of
spectroscopy. "Data" is a more general term that includes categorical
data as well. In other words, I'm oriented to data
that you would plot in a spreadsheet using the scatter
chart type rather than bar or pie charts.
Some of the
examples come from my
own areas of research in analytical chemistry, but these
techniques have been used in a wide
range of application areas. My software has been cited in 750
journal papers, theses, and patents, covering fields from
industrial, environmental, medical, engineering, earth science,
space, military, financial, agriculture, and even music and
linguistics. Suggestions and experimental data sent by hundreds of
readers from their own work has helped shape my writing and
software development. Much effort has gone into making this
document concise and understandable; it has been highly praised by many readers.
At the present time, this work does not cover image processing,
pattern recognition, or factor analysis. For more advanced topics
and for a more rigorous treatment of the underlying mathematics,
refer to the extensive literature on signal processing and on
statistics and chemometrics.
Throughout this work, a wide range of
applications and connections are described, some potentially
intriguing, such as stock market investing (page
322), human cognitive biases (page
356), the failure of a NASA spacecraft (page
72), cosmic rays from outer space (page
50), adding one kind of noise to reduce another (page
304), studying beach erosion by wind-blown sand (page
300), detecting potentially toxic trace elements in soot (page
11), programming with artificial intelligence (page
442), a technique that expands the classical limits of
measurement in spectroscopy (page
271), the intelligibility of digitized speech (page 99
and 381),
low-cost miniature computers (page
339), and an easy way to create interactive GUI apps (page
363). The citations list (page
501) is evidence of a truly mind-boggling range of published
applications. Note: the page numbers here refer to the PDF
version.
This site makes considerable use of Matlab, a
high-performance commercial and proprietary numerical computing
environment and "fourth generation" programming language that is
widely used in research (14, 17, 19, 20), Octave, a free
Matlab alternative that runs almost all of the programs and
examples in this tutorial, and Python,
a powerful but free and open-source language. There is a good
reason why Matlab and Python have become so popular in science
and engineering; they are powerful, fast, and relatively easy to
learn. A very important aspect of both Matlab and Python is the
concept of functions, which are self contained modules of code
that accomplish a specific task. Functions usually "take in"
data, process it, and "return" a result. (A trivial example is a=sqrt(b),
which takes the value of b, computes its square root,
and assigns it to the variable a). Once a function is
written, it can be used over and over and over again. Functions
can be "called" from the inside of other functions. Matlab and
Python come with built-in functions for doing data processing
tasks like matrix math, filtering, Fourier transforms,
convolution and deconvolution, multi-linear regression, and
optimization. You can write your own custom functions to
use in your future programming projects, and you can download
form their collection of thousands of useful user-contributed
functions. Matlab and Python have available a large number of
add-ons, called toolboxes in
Matlab and packages
in Python, which have been created by experts in
various fields for performing specialized tasks, including
various mathematical tasks, parallel computing, symbolic math,
and interfacing to other languages such as Mathematica and
to libraries written in C, C++, Java, Fortran, and Python; and
it's extensible to model-based
design for dynamic and embedded
systems. A companion piece for Matlab called Simulink
is a graphical programming environment for modeling,
simulating and analyzing multidomain dynamical systems.
There are
many code examples in this text that you can Copy and Paste
(or drag and drop) into the Matlab/Octave command line to run or
modify, which is especially convenient if you can split your screen between the
two. If you try to run one of my scripts
or functions and it gives you a "missing function" error, that
means either that you have not yet downloaded that item from
my web site or that you have not placed it in the "path". Look
for the missing item here,
download it into your path, and try again. Type "help path" at the
Matlab/Octave command prompt for help and related commands.
You can also
download working examples of most of the techniques covered here
in spreadsheet
format for Excel or OpenOffice Calc.See functions.html#spreadsheets
for links to these.
Octave (currently version 6.4.0)
and the OpenOffice
Calc (LibreOffice
Calc) spreadsheet program can be downloaded without cost
from their respective web sites. Python is also a
free download.
All of the
scripts and functions and spreadsheets used here can be downloaded from this site at no
cost; they have received extraordinarily
positive feedback from users. If you try to run one of my
scripts or functions and it gives you a "missing function"
error, look for the missing item on functions.html,
download it into your path, and try again.
If you are unfamiliar with Matlab, read these sections about basics and functions and scripts
for a quick start-up. Matlab is not really a general-purpose
programming languages like C++ or Python; rather, it is
specifically suited to numerical methods, matrix manipulation,
plotting of functions and data, implementation of algorithms,
creation of user interfaces, and deployment to portable devices
such as tablets - essentially the needs of numerical computing
by scientists and engineers. Matlab is more loosely
typed and less well structured in a formal sense than
other languages, and thus tends to be more favored by scientists
and engineers and less well liked by computer scientists and
professional programmers. To get a basic language like Python up
to the point where Matlab starts takes a considerable
effort and familiarity with computer jargon to install add-on
"packages" of functions that Matlab comes with. This is not a
criticism of Python, which is an extremely capable and
widely-used language, just an observation of different needs for
different fields.
There are
several versions of Matlab, including lower-cost student
and home
versions. See https://www.mathworks.com/pricing-licensing.html
for prices and restrictions in their use. It is possible
that your workplace may have a site license for Matlab. There
are also several other good free alternatives to MATLAB, in
particular Octave, which is essentially a Matlab clone, but
there is also Scilab, FreeMat, Julia,
and Sage
which are somewhat compatible with the MATLAB language and
which illustrate the influence of Matlab in the scientific
computing community. For a discussion of other possibilities,
see
http://www.dspguru.com/dsp/links/matlab-clones.
Current personal
computers and laptops are now so fast at calculating and
plotting that it is possible to work with data and signal
processing in a new way, inreal time,
or interactively, pressing a key or clicking
the mouse and seeing the results instantly, for example
using my keystroke-driven
programs, Matlab "Live scripts",
Matlab
"apps", or Python Jupyter Notebooks. These
programming methods have made working with data a different
experience.
This work is dedicated to the Joy of Uncompetitive
Purposefulness.
"As we
benefit from the inventions of others, we should be glad
to share our own ... freely and gladly".
Benjamin Franklin
"...in
our culture of competitive self-comparison, we can choose to
amplify each other's accomplishments because there is, after
all, enough to go around." Maria
Popova
"People are generally better persuaded by the reasons which they
have themselves discovered than by those which have come into
the mind of others." Blaise Pascal
"...producing technologies, and then teaching them to others,
... pushes humankind ahead". David Premack
"A computer does not substitute for judgment any more than a
pencil substitutes for literacy. But writing without a pencil is
no particular advantage." Robert
McNamara
"...in the course of looking deeply within ourselves, we may
challenge notions that give comfort before the terrors of the
world. Supporters of superstition and pseudoscience are human
beings with real feelings, who, like the skeptics, are trying to
figure out how the world works and what our role in it might be.
Their motives are in many cases consonant with science."
Carl Sagan, in The
Demon-Haunted World: Science as a Candle in the Dark.
"...[be] full of wonder, generously open to every notion,
[dismiss] nothing except for good reason, but at the same time,
and as second nature, [demand] stringent standards of evidence,
...[applied] with at least as much rigor to what [you] hold dear
as to what [you] are tempted to reject with impunity."Carl
Sagan References
1. Douglas A. Skoog, Principles of Instrumental Analysis,
Third Edition, Saunders, Philadelphia, 1984. Pages 73-76.
2. Gary D. Christian and James E. O'Reilly, Instrumental
Analysis, Second Edition, Allyn and Bacon, Boston, 1986.
Pages 846-851.
3. Howard V. Malmstadt, Christie G. Enke, and Gary Horlick, Electronic
Measurements
for Scientists, W. A. Benjamin, Menlo Park, 1974. Pages
816-870.
4. Stephen C. Gates and Jordan Becker, Laboratory Automation
using the IBM PC, Prentice Hall, Englewood Cliffs, NJ, 1989.
5. Muhammad A. Sharaf, Deborah L Illman, and Bruce R. Kowalski, Chemometrics,
John Wiley and Sons, New York, 1986.
8. A. Felinger, Data Analysis and Signal Processing in
Chromatography, Elsevier Science (19 May 1998).
9. Matthias Otto, Chemometrics: Statistics and Computer
Application in Analytical Chemistry, Wiley-VCH (March 19, 1999).
Some parts viewable in Google
Books.
10. Steven W. Smith, The Scientist and Engineer's Guide to
Digital Signal Processing. (Downloadable chapter by chapter
in PDF format from http://www.dspguide.com/pdfbook.htm).
This is a much more general treatment of the topic.
16. Chao Yang , Zengyou He and Weichuan Yu, Comparison of
public peak detection algorithms for MALDI mass spectrometry data
analysis, http://www.biomedcentral.com/1471-2105/10/4
19. Nicholas Laude, Christopher Atcherley, and Michael Heien, Rethinking
Data Collection and Signal Processing. 1. Real-Time Oversampling
Filter for Chemical Measurements,https://pubs.acs.org/doi/abs/10.1021/ac302169y
23. R. de Levie, Advanced Excel for scientific data analysis,
Oxford University Press, New York (2004)
24. S. K. Mitra, Digital Signal Processing, a computer-based
approach, 4th edition, McGraw-Hill, New York, 2011.
25. "Calibration in Continuum-Source AA by
Curve Fitting the Transmission Profile" , T. C. O'Haver and J.
Kindervater, J. of Analytical Atomic Spectroscopy 1, 89
(1986)
26. "Estimation of Atomic
Absorption Line Widths in Air-Acetylene Flames by Transmission
Profile Modeling", T. C. O'Haver and Jing-Chyi Chang, Spectrochim.
Acta 44B, 795-809 (1989)
27. "Effect of the Source/Absorber
Width Ratio on the Signal-to-Noise Ratio of Dispersive
Absorption Spectrometry",T. C. O'Haver, Anal. Chem. 68, 164-169 (1991).
28. "Derivative
Luminescence Spectrometry", G. L. Green and T. C. O'Haver, Anal.
Chem. 46, 2191 (1974).
29. "Derivative
Spectroscopy", T. C. O'Haver and G. L. Green, American
Laboratory 7, 15 (1975).
30. "Numerical Error Analysis of Derivative
Spectroscopy for the Quantitative Analysis of Mixtures", T. C.
O'Haver and G. L. Green, Anal. Chem. 48, 312 (1976).
31. "Derivative
Spectroscopy: Theoretical Aspects", T. C. O'Haver, Anal.
Proc. 19, 22-28 (1982).
32. "Derivative
and Wavelength Modulation Spectrometry," T. C. O'Haver, Anal.
Chem. 51, 91A (1979).
33. "A
Microprocessor-based Signal Processing Module for Analytical
Instrumentation", T. C. O'Haver and A. Smith, American Lab.
13, 43 (1981).
34. "Introduction
to Signal Processing in Analytical Chemistry", T. C. O'Haver, J.
Chem. Educ. 68 (1991)
35. "Applications
of Computers and Computer Software in Teaching Analytical
Chemistry", T. C. O'Haver, Anal. Chem. 68, 521A
(1991).
36. "The
Object is Productivity", T. C. O'Haver, Intelligent
Instruments and Computers March-April, 1992, p
67-70.
37. Analysis
software for spectroscopy and mass spectrometry, Spectrum Square
Associates ( http://www.spectrumsquare.com/).
38. Fityk, a program for data processing and
nonlinear curve fitting. (http://fityk.nieto.pl/)
44. Nate Silver, The
Signal and the Noise: Why
So Many Predictions Fail-but Some Don't , Penguin Press,
2012. ISBN 159420411X . A much broader look at "signal" and
"noise", aimed at a general audience, but still worth reading.
59. T. C. O'Haver, Teaching and Learning Chemometrics with Matlab,
Chemometrics and Intelligent Laboratory Systems 6, 95-103
(1989).
60. Allen B. Downey, "Think DSP", Green Tree Press, 2014.
(164-page PDF download). Python code instruction using sound as a
basis.
61. Purnendu K. Dasgupta, et. al, "Black Box Linearization for
Greater Linear Dynamic Range: The Effect of Power Transforms
on the Representation of Data", Anal. Chem. 2010, 82,
10143 - 10150.
62. Joseph Dubrovkin, Mathematical Processing of Spectral Data in
Analytical Chemistry: A Guide to Error Analysis, Cambridge
Scholars Publishing, 2018, 379 pages. ISBN 978-1-5275-1152-1. Link.
63. Power Law Approach as a Convenient Protocol for Improving Peak
Shapes and Recovering Areas from Partially Resolved Peaks, M.
Farooq Wahab, Fabrice Gritti, Thomas C. O'Haver, Garrett
Hellinghausen, Daniel W. Armstrong, Chromatographia
(2018). https://doi.org/10.1007/s10337-018-3607-0.
64. T. C. O'Haver, Interactive
Simulations of Basic Electronic and Operational Amplifier
Circuits, https://terpconnect.umd.edu/~toh/ElectroSim,
(1996) 65. Signal Processing at Rice University. (http://dsp.rice.edu/software/) 66. Steven Pinker, The Sense of Style: The Thinking
Person's Guide to Writing in the 21st Century, New York,
NY: Penguin, 2004.
68.
Separations at the Speed of Sensors, D. C. Patel, M. Farooq
Wahab, T. C. O'Haver, and Daniel W. Armstrong, Analytical
Chemistry 2018 90 (5), 3349-3356, DOI:
10.1021/acs.analchem.7b04944
69. MF
Wahab, TC O'Haver, F. Gritti, G.Hellinghausen, and DW Armstrong,
"Increasing chromatographic resolution of analytical signals
using derivative enhancement approach," Talanta, vol. 192, pp.
492 - 499, 2019
72. Yuri Kalambet, Yuri Kozmin, Andrey Samokhin, "Comparison of
integration rules in the case of very narrow chromatographic peaks", Chemometrics
and Intelligent Laboratory Systems 179, May 2018. DOI:
10.1016/j.chemolab.2018.06.001
73. Yuri Kalambet, et. al., "Reconstruction of chromatographic
peaks using the exponentially modified Gaussian function", Journal
of Chemometrics June 2011, 25(7):352 - 356. DOI:
10.1002/cem.1343
74. Allen, L. C., Gladney, H. M., Glarum, S. H., J. Chem. Phys.
40, 3135 (1964)
75. J. W. Ashley, Charles N. Reilley, "De-Tailing and Sharpening
of Response Peaks in Gas Chromatography", Anal. Chem., 37,
6, 626-630, 1965.
76. M. Johansson, M. Berglund and D. C. Baxter, "Improving accuracy
in the quantitation of overlapping, asymmetric, chromatographic
peaks by deconvolution: theory and application to coupled gas
chromatography atomic absorption spectrometry", Spectrochemica
Acta, Vol 48B, p. 1393-1409, 1993.
77. S. Sterlinski, "A Method for Resolution Enhancement of
Interfering Peaks in Ge(Li) Gamma-Ray Spectra", J. of
Radioanalytical Chemistry, 31, 195-226, 1976.
78. "Importance
of academic blogs",
Teachers Insurance and Annuity Association of America-College
Retirement Equities Fund, New York, NY.
https://careerpurpose.com/industries/education/academic-blogs.
79. Robi Polikar, The Wavelet Tutorial,
http://web.iitd.ac.in/~sumeet/WaveletTutorial.pdf
80. C. Valens, "A
Really Friendly Guide to Wavelets",
http://agl.cs.unm.edu/~williams/cs530/arfgtw.pdf
81. Brani Vidakovic and Peter Mueller, "Wavelets for Kids",http://www.gtwavelet.bme.gatech.edu/wp/kidsA.pdf
82. Amara Graps, "An
Introduction to Wavelets",
https://www.eecis.udel.edu/~amer/CISC651/IEEEwavelet.pdf
83. Muhammad Ryan, "What
is Wavelet and How We Use It for Data Science",
https://towardsdatascience.com/what-is-wavelet-and-how-we-use-it-for-data-science-d19427699cef
84. Michael X. Cohen, "A better way to define and describe Morlet
wavelets for time-frequency analysis", NeuroImage, Volume 199, 1
October 2019, Pages 81-86.
85. Wahab M. F, O'Haver T. C., "Wavelet transforms in separation science
for denoising and peak overlap detection." J Sep Sci. 43 (9-10)
1615-2012 (2020). ISSN 1615-9306;
https://doi.org/10.1002/jssc.202000013
86. G. K. Wertheim, J. of Electron Spectroscopy and Related
Phenomena, 6 (1975) 239-251.
87. R. E. Sturgeon, et. al., "Atomization in graphite-furnace atomic
absorption spectrometry. Peak-height method vs. integration method
of measuring absorbance". Anal. Chem. 47, 8, 1240-1249 (1075)
https://doi.org/10.1021/ac60358a039
88. Sunaina et al, "Calculating
numerical derivatives using Fourier transform: some pitfalls and
how to avoid them",
Eur. J. Phys. 39 ,065806, 2018
89. Sinex, Scott A, Investigating types of errors. Spreadsheets
in Education 2.1 (2005): 115-124.
90. Catherine Perrin, Beata Walczak, and Desire Luc Massart, "Quantitative
Determination of the Components in Overlapping Chromatographic
Peaks Using Wavelet Transform", Analytical Chemistry 2001 73
(20), 4903-4917; DOI: 10.1021/ac010416a
91. F. Gritti, S. Besner, S. Cormier, M. Gilar, Applications of
high-resolution recycling liquid chromatography: from small to
large molecules, Journal of Chromatography A 1524 (2017)
108-120.
92. Desimoni E. and Brunetti B., "About Estimating the Limit
of Detection by the Signal to Noise Approach", Pharmaceutica
Analytica Acta 67, 4, 2015. DOI: 10.4172/2153-2435.100035.
PDF link.
93. Royal Society of Chemistry Analytical Methods Committee,
"Recommendations for the Definition, Estimation and Use of the
Detection Limit",
Analyst, Feb. 1987, vol.112, p. 199.
94. "MATLAB vs Python:
Why and How to Make the Switch",
https://realpython.com/matlab-vs-python/
95. MLAB, an advanced mathematical and statistical modeling
system, by Gary Knott.
97."Why and How
Savitzky-Golay Filters Should Be Replaced", Michael Schmid, David
Rath, and Ulrike Diebold, ACS Measurement Science Au 2022 2 (2),
185-196. DOI: 10.1021/acsmeasuresciau.1c00054
98. Farooq Wahab and Thomas C. O'Haver, "Peak deconvolution
with significant noise suppression and stability using a facile
numerical approach in in Fourier space", Chemometrics and
Intelligent Laboratory Systems 235, 2023. https://authors.elsevier.com/c/1gVwgcc6MExCW
99. M.F. Wahab, F. Gritti, T.C. O'Haver, Discrete Fourier
transform techniques for noise reduction and digital enhancement
of analytical signals, TrAC, Trends Anal. Chem., 143,
Article 116354 (2021)
101. Nick
Bilton, “Future Tense”, Vanity Fair, Oct. 2013. Link.
Updated
August, 2024 This page is part of "A
Pragmatic Introduction to Signal Processing", created
and maintained by Prof. Tom
O'Haver, Department of Chemistry and Biochemistry, The
University of Maryland at College Park. Comments, suggestions and
questions should be directed to Prof. O'Haver at toh@umd.edu.