[Classical Least Squares (CLS)
calibration] [**Inverse Least Squares
(ILS) calibration**] [Spreadheets]
[Matlab or Octave]

The spectroscopic analysis of mixtures, when the spectra of the components overlap considerably, can be performed using special calibration methods based on a type of linear least-squares called multiple linear regression. This method is widely used in multi-wavelength instruments such as diode-array, Fourier transform, and automated scanning spectrometers. In this case the math involves the application of a little basic matrix algebra (a.k.a., linear algebra), which is just a shorthand notation for dealing with signals expressed as equations with one term for each point.

Definitions:

n = number of distinct chemical components in the mixture

s = number of samples

s1,s2 = sample 1, sample 2, etc.

c = molar concentration

c1, c2 = component 1, component 2, etc.

w = number of wavelengths at which signal is measured

w1, w2 = wavelength 1, wavelength 2, etc.

Ɛ = analytical sensitivity (slope of a plot of A vs c)

A = analytical signal

Assumptions:

The analytical signal, A (such as absorbance in absorption spectroscopy, fluorescence intensity in fluorescence spectroscopy, and reflectance in reflectance spectroscopy) is directly proportional to concentration, c. The proportionality constant, which is the slope of a plot of A vs c, is Ɛ.

A = Ɛc

The total signal is the sum of the signals for each component in a mixture:

A_{total}
= A_{c1} +
A_{c2} +
... for all *n* components.

Classical Least Squares (CLS) calibration. This method is applicable to the quantitative analysis of a mixture of components when the spectra of the individual components are known and where total signal of the mixture is the sum of the signals for each component in the mixture. Measurement of the spectra of known concentrations of the separate components allows their analytical sensitivity Ɛ at each wavelength to be determined. Then it follows that:

A_{w1}=Ɛ_{c1,w}_{1} c_{c1} + Ɛ_{c2,w}_{1} c_{c2} + Ɛ_{c3,w}_{1} **c**_{c3} + ... for all *n* components.

A_{w2}=Ɛ_{c1,}_{w}_{2} **c**_{c1} + Ɛ_{c2,}_{w}_{2} **c**_{c2} + Ɛ_{c3,w}_{2} **c**_{c3} + ...

and so on for all wavelengths - λ3, λ5, etc. It's messy to write out all
these individual terms,
especially because there may be *hundreds*
of wavelengths in modern array-detector spectrometers. But
despite
the mass of raw data, these are just really nothing more than
linear
equations and so the calculations required here are actually
pretty
simple and certainly very easy for a computer to do. So we
really
need a correspondingly simple notation that's more compact. This
big
set of linear equations can be written:

**A** = **Ɛ****C
**

If you have a sample solution containing unknown amounts of components those , then you measure its spectrum

**C** = **Ɛ**^{-1}**A**

(Ɛ^{T}Ɛ)^{-1}Ɛ^{T}**A** = (Ɛ^{T}Ɛ)^{-1}Ɛ^{T}Ɛ**C**
= (Ɛ^{T}Ɛ)^{-1}(Ɛ^{T}Ɛ)**C**

But the quantity (**Ɛ**^{T}**Ɛ**)^{-1}(**Ɛ**^{T}**Ɛ**) is a matrix times its
inverse and is therefore unity. Thus:

C**
**= (Ɛ^{T}Ɛ)^{-1}Ɛ^{T}**A**

Two extensions of the CLS method are commonly made. First, in order to account for baseline shift caused by drift, background, and light scattering, a column of 1s is added to the Ɛ matrix. This has the effect of introducing into the solution an additional component with a flat spectrum; this is referred to as “background correction”. Second, in order to account for the fact that the precision of measurement may vary with wavelength, it is common to perform a

**C** =
(**Ɛ**^{T} **Ɛ**^{-1}**Ɛ**)^{-1} **Ɛ**^{T} **V**^{-1 }**A**

The method is in principle applicable to any number of overlapping components. Its accuracy is limited by how accurately the spectra of the individual components are known, the amount of noise in the signal, the extent of overlap of the spectra, and the linearity of the analytical curves of each component (the extent to which the signal amplitudes are proportional to concentration). In practice, the method does not work well with old-style instruments with manual wavelength control, because of insufficient wavelength reproducibility, but it works better with automated computer-controlled scanning instruments and is perfectly suited to diode-array and Fourier transform instruments. The method is widely applied in absorption spectrophotometry, where the well-known deviations from analytical curve linearity set a limit to the performance to this method (and which can be avoided, in a related method, by applying curve fitting to the transmission spectra).

c_{s1} = m_{w1}A_{s1,}_{w1} + m_{w2}A_{s1,w2}+ m_{w3}A_{s1,}_{w3} + ... for all *w* wavelengths.

c_{s2} = m_{w1}A_{s2,}_{w}_{1} + m_{w2}A_{s2,}_{w}_{2} + m_{w3}A_{s2,}_{w3} + ...

and so on for all *s* samples. In matrix
form

**C **= **AM**

Now, suppose that you have a set of standard samples that are typical of the type of sample that you wish to be able to measure and which contain a range of analyte concentrations that span the range of concentrations expected to be found in other samples of that type. This will serve as the

**M **= (**A**^{T}**A**)^{-1}**A**^{T}**C
**

(Note that A^{T}A is a square matrix of size
w, the number of
wavelengths, which must be less than s). This calibration vector can be used to
compute the analyte concentrations of other samples, which are
similar to but not in the calibration set, from the measured
spectra of the samples:

**C **= **AM**

Spreadheets. Most modern spreadsheets have basic matrix manipulation capabilities and can be used for multicomponent calibration, for example Excel and OpenOffice Calc. The spreadsheets RegressionDemo.xls and RegressionDemo.ods (for Excel and Calc, respectively) demonstrate the classical least squares procedure for a simulated spectrum of a 5-component mixture measured at 100 wavelengths. A screen shot is shown below. The matrix calculations described above that solves for the concentration of the components on the unknown mixture:

C**
**= (Ɛ^{T}Ɛ)^{-1}Ɛ^{T}**A**

are performed in these spreadsheets by
the TRANSPOSE (matrix transpose), MMULT (matrix multiplication),
and MINVERSE (matrix inverse) array functions, laid out
step-by-step in rows 123 to
158 of this spreadsheet. These array operations may be
combined into one cell equation:

C =
MMULT(MMULT(MINVERSE(MMULT(TRANSPOSE(E);E));TRANSPOSE(E));A)

where C is the vector of the 5
concentrations of all the components in the mixture, E is the 5 x
100 rectangular matrix of the known sensitivities (e.g.
absorptivities) for each of the 5 components at
each of the 100 wavelengths, and A is the vector of measured
signals at each of the 100 wavelengths (i.e. the signal
spectrum) of the unknown mixture.
(Note: spreadsheet array functions like this must be entered
by typing Ctrl-Shift-Enter,
not just Enter as
usual, See http://www.openofficetips.com/blog/archives/2004/10/array_formulas.html).

.

OpenOffice Calc spreadsheet demonstrating the CLS procedure for the measurement of a 5-component unknown mixture at 100 wavelengths

OpenOffice Calc spreadsheet demonstrating the CLS procedure for the measurement of a 5-component unknown mixture at 100 wavelengths

Alternatively, you can skip over all the details above and use the

Matlab or Octave are really the natural computer approaches to multicomponent analysis because they handle all types of matrix math so easily, compactly, and quickly. In these languages, the notation is very compact: the transpose of matrix A is A', the inverse of A is inv(A), and matrix multiplication is designated by an asterisk (*). Thus the solution to the classical least squares method above is written in Matlab/Octave notation as

C = inv(E'*E)*E'*A

where E is the rectangular matrix of sensitivities at each wavelength for each component and A is the observed spectrum of the mixture. Note that the Matlab/Octave notation is not only shorter than the spreadsheet notation, it's also closer to the traditional mathematical notation. Even more compactly, you can write C = A * pinv(E')' or C = A/E, which yield essentially the same results in the same execution time.

MeasuredAmp = ObservedSpectrum*A'*inv(A*A')

Extensions: The extension to "background correction" is easily accomplished in Matlab/Octave by adding a column of 1s to the A matrix containing the absorption spectrum of each of the components.:

background=ones(size(ObservedSpectrum));

A=[background A1 A2 A3];

A=[background A1 A2 A3];

where A1, A2, A3... are the absorption spectra of the individual components. Performing a T-weighted regression is also readily performed:

weight=T;

MeasuredAmp=([weight weight] .* A)\(ObservedSpectrum .* weight);

MeasuredAmp=([weight weight] .* A)\(ObservedSpectrum .* weight);

where T is the transmission spectrum. Here, the matrix division backslash "\" is used as a short-cut to the classical least-squares matrix solution (c.f. http://www.mathworks.com/help/techdoc/ref/mldivide.html).

The cls.m function: Ordinarily, the calibration matrix M is assembled from the experimentally measured signals (e.g. spectra) of the individual components of the mixture, but it is also possible to fit a computer-generated model of basic peak shapes (e.g. Gaussians, Lorentzians, etc) to a signal to determine if that signal can be represented as the weighted sum of overlapping basic peak shapes. The function cls.m computes such a model consisting of the sum of any number of peaks of known shape, width, and position, but of unknown height, and fit it to noisy x,y data sets. The syntax is

heights=cls(x,y,NumPeaks,PeakShape,Positions,Widths,extra)

where x and y are the vectors of measured data (e.g. x might be wavelength and y might be the absorbance at each wavelength), 'NumPeaks' is the number of peaks, 'PeakShape' is the peak shape number (1=Gaussian, 2=Lorentzian, 3=logistic, 4=Pearson, 5=exponentially broadened Gaussian; 6=equal-width Gaussians; 7=Equal-width Lorentzians; 8=exponentially broadened equal-width Gaussian, 9=exponential pulse, 10=sigmoid, 11=Fixed-width Gaussian, 12=Fixed-width Lorentzian; 13=Gaussian/Lorentzian blend; 14=BiGaussian, 15=BiLorentzian), 'Positions' is the vector of peak positions on the x axis (one entry per peak), 'Widths' is the vector of peak widths in x units (one entry per peak), and 'extra' is the additional shape parameter required by the exponentionally broadened, Pearson, Gaussain/Lorentzian blend, BiGaussain and BiLorentzian shapes. Cls.m returns a vector of measured peak heights for each peak. clsdemo.m is a demonstration script that creates some noisy model data, fits it with cls.m, then computes the accuracy of the measured heights.

Weighted linear regression: The custom Matlab/Octave function "tfit.m" simulates the measurement of three absorbing components by weighted linear regression, demonstrates the effect of the amount of noise in the signal, the extent of overlap of the spectra, and the linearity of the analytical curves of each component, and also compares the results to a more advanced method that applies curve fitting to the transmission spectra rather than to the absorbance spectra.

Note: you can right-click on any of the m-file links above and select Save Link As... to download them to your computer for use within Matlab.

Revised November, 2014. This page is maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu.

Unique visits since May 17, 2008: