The interfacing of measurement instrumentation to small computers
has now become standard practice in the modern science laboratory.
Computers are used for data acquisition, data, and storage, using
a large number of digital computer-based numerical methods.
Techniques are available that can transform signals into more
useful forms, detect and measure peaks, reduce noise, improve the
resolution of overlapping peaks, compensate for instrumental
artifacts, test hypotheses, optimize measurement strategies,
diagnose measurement difficulties, and decompose complex signals
into their component parts. These techniques can often make
difficult measurements easier by extracting more information from
the available data. Many of these techniques are based on
laborious mathematical procedures and/or analog electronics that
were not really practical before the advent of computerized
instrumentation. It is important to appreciate the abilities, as
well as the limitations, of these techniques. But in recent
decades, computers and digital storage and processing has become
commonplace, much more accurate, far less costly, easier to
program, and literally millions of times more capable
altogether, reducing the cost of raw data and making complex
computer-based signal processing techniques both more practical and
necessary. Computations that were previously impractical are now
common, and approximations and shortcuts that were once
necessitated by mathematical convenience are no longer needed. But
it's not just the growth of computer power: there are now new
materials, new instruments, new fabrication techniques, new
automation capabilities. We have lasers, fiber optics,
superconductors, supermagnets, holograms, quantum technology,
nanotechnology, and more. Sensors are now smaller and cheaper and
faster than ever before; we can measure over a wider range of
speeds, temperatures, pressures, and locations. There are new kinds
of data that we never had before. As Erik Brynjolfsson and
Andrew McAfee wrote in The Second Machine Age (W. W.
Norton, 2014): "...many types of raw data are getting dramatically
cheaper, and as data get cheaper, the bottleneck increasingly is
the ability to interpret and use data". Kate
Keahey, a Senior Scientist at Argonne National Laboratory,
involved with gravitational wave research, has said that "Software
is a vital part of the research landscape, and most researchers
will benefit from understanding its possibilities, limitations and
the requirements for building it".
This essay covers only basic topics
related to one-dimensional time-series signals, not
two-dimensional data such as images. It uses a pragmatic
approach and is limited to mathematics only up to the most
elementary aspects of calculus, statistics, and matrix math. I
use logical arguments, analogies, graphics, and animation to
explain ideas, rather than lots of formal mathematics. Data
processing without math? Not really! Math is essential,
just as it is for the technology of cell phones, GPS, digital
photography, the Web, and computer games. But you can get started
using these tools without understanding all the underlying
math and software details. Seeing it work makes it more likely
that you'll want to understand how it works. But in the
long run, it's not enough just to know how to operate the
software, any more than knowing how to use a word processor or a
MIDI sequencer makes you a good author or musician.
Why do I
title this document "signal processing" rather than "data
processing"? By "signal" I mean the continuous x,y
numerical data recorded by scientific instruments as time-series,
where x may be time or another quantity like energy
or wavelength, as in the various forms of
spectroscopy. "Data" is a more general term that includes categorical
data as well. In other words, I'm oriented to data
that you would plot in a spreadsheet using the scatter
chart type rather than bar or pie charts.
Some of the
examples come from my
own areas of research in analytical chemistry, but these
techniques have been used in a wide
range of application areas. My software has been cited in
over 500
journal papers, theses, and patents, covering fields from
industrial, environmental, medical, engineering, earth science,
space, military, financial, agriculture, and even music and
linguistics. Suggestions and experimental data sent by hundreds of
readers from their own work has helped shape my writing and
software development. Much effort has gone into making this
document concise and understandable; it has been highly
praised by many readers.
At the present time, this work does not cover image processing,
pattern recognition, or factor analysis. For more advanced topics
and for a more rigorous treatment of the underlying mathematics,
refer to the extensive literature on signal processing and on
statistics and chemometrics.
This site had its origin in one
of the experiments in a course called "Electronics
and Computer Interfacing for Chemists" that I developed and
taught at the University of Maryland in the 80's and 90's. The
first Web-based version went up in 1995. Subsequently it has been
revised and greatly expanded based on feedback from users. It is
still a work in progress and, as such, benefits from continued
feedback from readers and users.
This tutorial makes considerable use of Matlab, a
high-performance commercial and proprietary numerical computing
environment and "fourth generation" programming language that is
widely used in research (14, 17, 19, 20), Octave, a free
Matlab alternative that runs almost all of the programs and
examples in this tutorial, and also Python,
a powerful but free and open-source language. There is a good
reason why Matlab is so massively popular in science and
engineering; it's powerful, fast, and relatively easy to learn.
A very important aspect of Matlab is the concept of functions,
which are self contained modules of code that accomplish a
specific task. Functions usually "take in" data, process it, and
"return" a result. (A trivial example is a=sqrt(b),
which takes the value of b, computes its square root,
and assigns it to the variable a). Once a function is
written, it can be used over and over and over again. Functions
can be "called" from the inside of other functions. Matlab comes
with built-in functions for doing data processing tasks like
matrix math, filtering, Fourier transforms, convolution and
deconvolution, multilinear regression, and optimization. You
can write your own custom functions to use in your future
programming projects, and you can download powerful toolboxes
and free user-contributed functions. Matlab can interface to C,
C++, Java, Fortran, and Python; and it's extensible to symbolic
computing and model-based
design for dynamic and embedded
systems. There are many code examples in this text that
you can Copy and Paste (or drag and drop) into the
Matlab/Octave command line to run or modify, which is
especially convenient if you can split your screen between the
two. If you try to run one of my scripts
or functions and it gives you a "missing function" error, that
means either that you have not yet downloaded that item from
my web site or that you have not placed it in the "path". Look
for the missing item here,
download it into your path, and try again. Type "help path" at the
Matlab/Octave command prompt for help and related commands.
Most of the
techniques covered in this work can also be performed in spreadsheets
(11, 22, 23) such as Excel or OpenOffice Calc.
Octave (currently version 6.4.0)
and the OpenOffice
Calc (LibreOffice
Calc) spreadsheet program can be downloaded without cost
from their respective web sites. Python is also a
free download.
All of the
Matlab/Octave scripts and functions, and all of the spreadsheets
used here can all be downloaded from
this site at no cost; they have received extraordinarily
positive feedback from users. If you try to run one of my
scripts or functions and it gives you a "missing function"
error, look for the missing item on functions.html,
download it into your path, and try again.
If you are unfamiliar with Matlab, read these sections about basics and functions and scripts
for a quick start-up. Matlab is not really a general-purpose
programming languages like C++ or Python; rather, it is
specifically suited to numerical methods, matrix manipulation,
plotting of functions and data, implementation of algorithms,
creation of user interfaces, and deployment to portable devices
such as tablets - essentially the needs of numerical computing
by scientists and engineers. Matlab is more loosely
typed and less well structured in a formal sense than
other languages, and thus tends to be more favored by scientists
and engineers and less well liked by computer scientists and
professional programmers. To get a basic language like Python up
to the point where Matlab starts takes a considerable
effort and familiarity with computer jargon to install add-on
"packages" of functions that Matlab comes with. This is not a
criticism of Python, which is an extremely capable and
widley-used language, just an observation of different needs for
different fields.
There are
several versions of Matlab, including lower-cost student
and home
versions. See https://www.mathworks.com/pricing-licensing.html
for prices and restrictions in their use. It is possible
that your workplace may have a site license for Matlab. There
are also several other good free alternatives to MATLAB, in
particular Octave, which is essentially a Matlab clone, but
there is also Scilab, FreeMat, Julia,
and Sage
which are somewhat compatible with the MATLAB language and
which illustrate the influence of Matlab in the scientific
computing community. For a discussion of other possibilities,
see
http://www.dspguru.com/dsp/links/matlab-clones.
This work is dedicated to the Joy of Uncompetitive
Purposefulness.
"As we
benefit from the inventions of others, we should be glad
to share our own ... freely and gladly".
Benjamin Franklin
"...in our culture of competitive
self-comparison, we can choose to amplify each other's
accomplishments because there is, after all, enough to go
around." Maria Popova
"People are generally better persuaded by the reasons which they
have themselves discovered than by those which have come into
the mind of others." Blaise Pascal
"...producing technologies, and then teaching them to others,
... pushes humankind ahead". David Premack
"A computer does not substitute for judgment any more than a
pencil substitutes for literacy. But writing without a pencil is
no particular advantage." Robert
McNamara
"...in the course of looking deeply within ourselves, we may
challenge notions that give comfort before the terrors of the
world. Supporters of superstition and pseudoscience are human
beings with real feelings, who, like the skeptics, are trying to
figure out how the world works and what our role in it might be.
Their motives are in many cases consonant with science."
Carl Sagan, in The
Demon-Haunted World: Science as a Candle in the Dark.
"...[be] full of wonder, generously open to every notion,
[dismiss] nothing except for good reason, but at the same time,
and as second nature, [demand] stringent standards of evidence,
...[applied] with at least as much rigor to what [you] hold dear
as to what [you] are tempted to reject with impunity."Carl
Sagan References
1. Douglas A. Skoog, Principles of Instrumental Analysis,
Third Edition, Saunders, Philadelphia, 1984. Pages 73-76.
2. Gary D. Christian and James E. O'Reilly, Instrumental
Analysis, Second Edition, Allyn and Bacon, Boston, 1986.
Pages 846-851.
3. Howard V. Malmstadt, Christie G. Enke, and Gary Horlick, Electronic
Measurements
for Scientists, W. A. Benjamin, Menlo Park, 1974. Pages
816-870.
4. Stephen C. Gates and Jordan Becker, Laboratory Automation
using the IBM PC, Prentice Hall, Englewood Cliffs, NJ, 1989.
5. Muhammad A. Sharaf, Deborah L Illman, and Bruce R. Kowalski, Chemometrics,
John Wiley and Sons, New York, 1986.
8. A. Felinger, Data Analysis and Signal Processing in
Chromatography, Elsevier Science (19 May 1998).
9. Matthias Otto, Chemometrics: Statistics and Computer
Application in Analytical Chemistry, Wiley-VCH (March 19, 1999).
Some parts viewable in Google
Books.
10. Steven W. Smith, The Scientist and Engineer's Guide to
Digital Signal Processing. (Downloadable chapter by chapter
in PDF format from http://www.dspguide.com/pdfbook.htm).
This is a much more general treatment of the topic.
16. Chao Yang , Zengyou He and Weichuan Yu, Comparison of
public peak detection algorithms for MALDI mass spectrometry data
analysis, http://www.biomedcentral.com/1471-2105/10/4
19. Nicholas Laude, Christopher Atcherley, and Michael Heien, Rethinking
Data Collection and Signal Processing. 1. Real-Time Oversampling
Filter for Chemical Measurements,https://pubs.acs.org/doi/abs/10.1021/ac302169y
23. R. de Levie, Advanced Excel for scientific data analysis,
Oxford University Press, New York (2004)
24. S. K. Mitra, Digital Signal Processing, a computer-based
approach, 4th edition, McGraw-Hill, New York, 2011.
25. "Calibration in Continuum-Source AA by
Curve Fitting the Transmission Profile" , T. C. O'Haver and J.
Kindervater, J. of Analytical Atomic Spectroscopy 1, 89
(1986)
26. "Estimation of Atomic
Absorption Line Widths in Air-Acetylene Flames by Transmission
Profile Modeling", T. C. O'Haver and Jing-Chyi Chang, Spectrochim.
Acta 44B, 795-809 (1989)
27. "Effect of the Source/Absorber
Width Ratio on the Signal-to-Noise Ratio of Dispersive
Absorption Spectrometry",T. C. O'Haver, Anal. Chem. 68, 164-169 (1991).
28. "Derivative
Luminescence Spectrometry", G. L. Green and T. C. O'Haver, Anal.
Chem. 46, 2191 (1974).
29. "Derivative
Spectroscopy", T. C. O'Haver and G. L. Green, American
Laboratory 7, 15 (1975).
30. "Numerical Error Analysis of Derivative
Spectroscopy for the Quantitative Analysis of Mixtures", T. C.
O'Haver and G. L. Green, Anal. Chem. 48, 312 (1976).
31. "Derivative
Spectroscopy: Theoretical Aspects", T. C. O'Haver, Anal.
Proc. 19, 22-28 (1982).
32. "Derivative
and Wavelength Modulation Spectrometry," T. C. O'Haver, Anal.
Chem. 51, 91A (1979).
33. "A
Microprocessor-based Signal Processing Module for Analytical
Instrumentation", T. C. O'Haver and A. Smith, American Lab.
13, 43 (1981).
34. "Introduction
to Signal Processing in Analytical Chemistry", T. C. O'Haver, J.
Chem. Educ. 68 (1991)
35. "Applications
of Computers and Computer Software in Teaching Analytical
Chemistry", T. C. O'Haver, Anal. Chem. 68, 521A
(1991).
36. "The
Object is Productivity", T. C. O'Haver, Intelligent
Instruments and Computers March-April, 1992, p
67-70.
37. Analysis
software for spectroscopy and mass spectrometry, Spectrum Square
Associates ( http://www.spectrumsquare.com/).
38. Fityk, a program for data processing and
nonlinear curve fitting. (http://fityk.nieto.pl/)
44. Nate Silver, The
Signal and the Noise: Why
So Many Predictions Fail-but Some Don't , Penguin Press,
2012. ISBN 159420411X . A much broader look at "signal" and
"noise", aimed at a general audience, but still worth reading.
59. T. C. O'Haver, Teaching and Learning Chemometrics with Matlab,
Chemometrics and Intelligent Laboratory Systems 6, 95-103
(1989).
60. Allen B. Downey, "Think DSP", Green Tree Press, 2014.
(164-page PDF download). Python code instruction using sound as a
basis.
61. Purnendu K. Dasgupta, et. al, "Black Box Linearization for
Greater Linear Dynamic Range: The Effect of Power Transforms
on the Representation of Data", Anal. Chem. 2010, 82,
10143 - 10150.
62. Joseph Dubrovkin, Mathematical Processing of Spectral Data in
Analytical Chemistry: A Guide to Error Analysis, Cambridge
Scholars Publishing, 2018, 379 pages. ISBN 978-1-5275-1152-1. Link.
63. Power Law Approach as a Convenient Protocol for Improving Peak
Shapes and Recovering Areas from Partially Resolved Peaks, M.
Farooq Wahab, Fabrice Gritti, Thomas C. O'Haver, Garrett
Hellinghausen, Daniel W. Armstrong, Chromatographia
(2018). https://doi.org/10.1007/s10337-018-3607-0.
64. T. C. O'Haver, Interactive
Simulations of Basic Electronic and Operational Amplifier
Circuits, https://terpconnect.umd.edu/~toh/ElectroSim,
(1996) 65. Signal Processing at Rice University. (http://dsp.rice.edu/software/) 66. Steven Pinker, The Sense of Style: The Thinking
Person's Guide to Writing in the 21st Century, New York,
NY: Penguin, 2004.
68.
Separations at the Speed of Sensors, D. C. Patel, M. Farooq
Wahab, T. C. O'Haver, and Daniel W. Armstrong, Analytical
Chemistry 2018 90 (5), 3349-3356, DOI:
10.1021/acs.analchem.7b04944
69. MF
Wahab, TC O'Haver, F. Gritti, G.Hellinghausen, and DW Armstrong,
"Increasing chromatographic resolution of analytical signals
using derivative enhancement approach," Talanta, vol. 192, pp.
492 - 499, 2019
70. MF
Wahab, TC O'Haver, F. Gritti, G. Hellinghausen, and DW Armstrong,
"Increasing
chromatographic resolution of analytical signals using derivative
enhancement approach,"
Talanta, vol. 192, pp. 492-499, 2019
72. Yuri Kalambet, Yuri Kozmin, Andrey Samokhin, "Comparison of
integration rules in the case of very narrow chromatographic peaks", Chemometrics
and Intelligent Laboratory Systems 179, May 2018. DOI:
10.1016/j.chemolab.2018.06.001
73. Yuri Kalambet, et. al., "Reconstruction of chromatographic
peaks using the exponentially modified Gaussian function", Journal
of Chemometrics June 2011, 25(7):352 - 356. DOI:
10.1002/cem.1343
74. Allen, L. C., Gladney, H. M., Glarum, S. H., J. Chem. Phys.
40, 3135 (1964)
75. J. W. Ashley, Charles N. Reilley, "De-Tailing and Sharpening
of Response Peaks in Gas Chromatography", Anal. Chem., 37,
6, 626-630, 1965.
76. M. Johansson, M. Berglund and D. C. Baxter, "Improving accuracy
in the quantitation of overlapping, asymmetric, chromatographic
peaks by deconvolution: theory and application to coupled gas
chromatography atomic absorption spectrometry", Spectrochemica
Acta, Vol 48B, p. 1393-1409, 1993.
77. S. Sterlinski, "A Method for Resolution Enhancement of
Interfering Peaks in Ge(Li) Gamma-Ray Spectra", J. of
Radioanalytical Chemistry, 31, 195-226, 1976.
78. "Importance
of academic blogs",
Teachers Insurance and Annuity Association of America-College
Retirement Equities Fund, New York, NY.
https://careerpurpose.com/industries/education/academic-blogs.
79. Robi Polikar, The Wavelet Tutorial,
http://web.iitd.ac.in/~sumeet/WaveletTutorial.pdf
80. C. Valens, "A
Really Friendly Guide to Wavelets",
http://agl.cs.unm.edu/~williams/cs530/arfgtw.pdf
81. Brani Vidakovic and Peter Mueller, "Wavelets for Kids",http://www.gtwavelet.bme.gatech.edu/wp/kidsA.pdf
82. Amara Graps, "An
Introduction to Wavelets",
https://www.eecis.udel.edu/~amer/CISC651/IEEEwavelet.pdf
83. Muhammad Ryan, "What
is Wavelet and How We Use It for Data Science",
https://towardsdatascience.com/what-is-wavelet-and-how-we-use-it-for-data-science-d19427699cef
84. Michael X. Cohen, "A better way to define and describe Morlet
wavelets for time-frequency analysis", NeuroImage, Volume 199, 1
October 2019, Pages 81-86.
85. Wahab M. F, O'Haver T. C., "Wavelet transforms in separation science
for denoising and peak overlap detection." J Sep Sci. 43 (9-10)
1615-2012 (2020). ISSN 1615-9306;
https://doi.org/10.1002/jssc.202000013
86. G. K. Wertheim, J. of Electron Spectroscopy and Related
Phenomena, 6 (1975) 239-251.
87. R. E. Sturgeon, et. al., "Atomization in graphite-furnace atomic
absorption spectrometry. Peak-height method vs. integration method
of measuring absorbance". Anal. Chem. 47, 8, 1240-1249 (1075)
https://doi.org/10.1021/ac60358a039
88. Sunaina et al, "Calculating
numerical derivatives using Fourier transform: some pitfalls and
how to avoid them",
Eur. J. Phys. 39 ,065806, 2018
89. Sinex, Scott A, Investigating types of errors. Spreadsheets
in Education 2.1 (2005): 115-124.
90. Catherine Perrin, Beata Walczak, and Desire Luc Massart, "Quantitative
Determination of the Components in Overlapping Chromatographic
Peaks Using Wavelet Transform", Analytical Chemistry 2001 73
(20), 4903-4917; DOI: 10.1021/ac010416a
91. F. Gritti, S. Besner, S. Cormier, M. Gilar, Applications of
high-resolution recycling liquid chromatography: from small to
large molecules, Journal of Chromatography A 1524 (2017)
108-120.
92. 90. Desimoni E. and Brunetti B., "About Estimating the Limit
of Detection by the Signal to Noise Approach", Pharmaceutica
Analytica Acta 67, 4, 2015. DOI: 10.4172/2153-2435.100035.
PDF link.
93. Royal Society of Chemistry Analytical Methods Commi"ttee,
Recommendations for the Definition, Estimation and Use of the
Detection Limit",
Analyst, Feb. 1987, vol.112, p. 199.
94. "MATLAB vs
Python: Why and How to Make the Switch",
https://realpython.com/matlab-vs-python/
95. MLAB, an advanced mathematical and statistical modeling
system, by Gary Knott.
Updated
May, 2022 This page is part of "A
Pragmatic Introduction to Signal Processing", created
and maintained by Prof. Tom
O'Haver, Department of Chemistry and Biochemistry, The
University of Maryland at College Park. Comments, suggestions and
questions should be directed to Prof. O'Haver at toh@umd.edu.