[Classical Least Squares (CLS) calibration]
[Inverse Least Squares (ILS) calibration]
[Spreadsheets] [Matlab/Octave]

The spectroscopic analysis of mixtures, when the spectra of the components overlap considerably, can be performed using special calibration methods based on a type of linear least-squares called

Definitions:

n = number of distinct chemical components in the mixture

s = number of samples

s1, s2 = sample 1, sample 2, etc.

c = molar concentration

c1, c2 = component 1, component 2, etc.

w = number of wavelengths at which signal is measured

w1, w2 = wavelength 1, wavelength 2, etc.

Ɛ = analytical sensitivity (slope of a plot of A vs c)

A = analytical signal

M

M

Assumptions:

The analytical signal, A (such as absorbance in absorption spectroscopy, fluorescence intensity in fluorescence spectroscopy, and reflectance in reflectance spectroscopy) is directly proportional to concentration, c. The proportionality constant, which is the slope of a plot of A vs c, is Ɛ.

A = Ɛc

The total signal is the sum of the signals for each component in a mixture:

A_{total}
= A_{c1} +
A_{c2} +
... for all *n* components.

Classical Least Squares (CLS) calibration.

A_{w1}=Ɛ_{c1,w}_{1} c_{c1} + Ɛ_{c2,w}_{1} c_{c2} + Ɛ_{c3,w}_{1} **c**_{c3} + ... for all *n* components.

A_{w2}=Ɛ_{c1,}_{w}_{2} **c**_{c1} + Ɛ_{c2,}_{w}_{2} **c**_{c2} + Ɛ_{c3,w}_{2} **c**_{c3} + ...

and so on for all wavelengths - w3, w4, etc. It's messy to write out all these individual terms, especially because there may be

**A** = **Ɛ****C
**

If you have a sample solution containing unknown amounts of components those n components, you measure its spectrum

(Ɛ^{T}Ɛ)^{-1}Ɛ^{T}**A** = (Ɛ^{T}Ɛ)^{-1}Ɛ^{T}Ɛ**C**
= (Ɛ^{T}Ɛ)^{-1}(Ɛ^{T}Ɛ)**C**

where Ɛ^{T }is the *
transpose *of Ɛ
(rows and columns switched). But the quantity (**Ɛ**^{T}**Ɛ**)^{-1}(**Ɛ**^{T}**Ɛ**) is a matrix times its
inverse and is therefore unity. Thus:

C**
**= (Ɛ^{T}Ɛ)^{-1}Ɛ^{T}**A**

Two extensions of the CLS method are commonly made. First, in order to account for baseline shift caused by drift, background, and light scattering, a column of 1s is added to the Ɛ matrix. This has the effect of introducing into the solution an additional component with a flat spectrum; this is referred to as “background correction”. Second, in order to account for the fact that the precision of measurement may vary with wavelength, it is common to perform a

**C** =
(**Ɛ**^{T} **Ɛ**^{-1}**Ɛ**)^{-1} **Ɛ**^{T} **V**^{-1 }**A**

The method is in principle applicable to any number of overlapping components. Its accuracy is limited by how accurately the spectra of the individual components are known, the amount of noise in the signal, the extent of overlap of the spectra, and the linearity of the analytical curves of each component (the extent to which the signal amplitudes are proportional to concentration). In practice, the method does not work well with old-style instruments with manual wavelength control, because of insufficient wavelength reproducibility. Specifically, many measurements are made on the

c_{s1} = m_{w1}A_{s1,}_{w1} + m_{w2}A_{s1,w2}+ m_{w3}A_{s1,}_{w3} + ... for all *w* wavelengths.

c_{s2} = m_{w1}A_{s2,}_{w}_{1} + m_{w2}A_{s2,}_{w}_{2} + m_{w3}A_{s2,}_{w3} + ...

and so on for all *s* samples. In matrix
form

**C **= **AM**

Now, suppose that you have a set of standard samples that are typical of the type of sample that you wish to be able to measure and which contain a range of analyte concentrations that span the range of concentrations expected to be found in other samples of that type. This will serve as the

**M **= (**A**^{T}**A**)^{-1}**A**^{T}**C
**

(Note that A^{T}A is a square matrix of size
w, the number of
wavelengths, which must be less than s). This calibration vector can be used to
compute the analyte concentrations of other samples, which are
similar to but not in the calibration set, from the measured
spectra of the samples:

**C **= **AM**

Spreadsheets

C**
**= (Ɛ^{T}Ɛ)^{-1}Ɛ^{T}**A**

are performed in these spreadsheets by
the TRANSPOSE (matrix transpose), MMULT (matrix multiplication),
and MINVERSE (matrix inverse) array functions, laid out
step-by-step in rows 123 to
158 of this spreadsheet. Alternatively, all these array
operations may be combined into one big scary cell equation:

C = MMULT (MMULT (MINVERSE (MMULT (TRANSPOSE (E);E)); TRANSPOSE (E));A)

where C is the vector of the 5
concentrations of all the components in the mixture, E is the 5 ×
100 rectangular matrix of the known sensitivities (e.g.
absorptivities) for each of the 5 components at
each of the 100 wavelengths, and A is the vector of measured
signals at each of the 100 wavelengths (i.e. the signal
spectrum) of the unknown mixture.
(Note: spreadsheet array functions like this must be entered
by typing Ctrl-Shift-Enter,
not just Enter as
usual, See "Guidelines
and examples of array formulas".

.

OpenOffice Calc spreadsheet demonstrating the CLS procedure for the measurement of a 5-component unknown mixture at 100 wavelengths

OpenOffice Calc spreadsheet demonstrating the CLS procedure for the measurement of a 5-component unknown mixture at 100 wavelengths

Alternatively, you can skip over all the details above and use the built-in

A template for performing a 5-component analysis on your own data, with step-by-step instructions, in available as RegressionTemplate.xls and RegressionTemplate.ods (Graphic with example data from the demo above). Paste you own data into columns B - G. You must adjust the formulas if your number of data points or of components is different from this example. The easiest way to add more wavelengths is to select an entire row anywhere between row 40 and the end, right-click on the row number on the left and select

Spreadsheets of this type, though easy to use once constructed, must be carefully modified for different applications that have different numbers of components or wavelengths, which is inconvenient and can be error-prone. However, it is possible to construct these spreadsheets in such a way that they

where E is the rectangular matrix of sensitivities at each wavelength for each component and A is the observed spectrum of the mixture. Note that the Matlab/Octave notation is not only shorter than the spreadsheet notation, it's also closer to the traditional mathematical notation. Even more compactly, you can write C = A/E, using the Matlab forward slash or "right divide" operator, which yields the same results but is in principle more accurate with respect to the numerical precision of the computer (usually negligible compared to the noise in the signal; see CaseStudies.html#Numerical).

MeasuredAmp = ObservedSpectrum*A'*inv(A*A')

Extensions:

(a) The extension to

MeasuredAmp = ObservedSpectrum*A'*inv(A*A')and the resulting "MeasuredAmp" will be an "s" × "n"

(b) The extension to

background=ones(size(ObservedSpectrum));

A=[background A1 A2 A3];

where A1, A2, A3... are the absorption spectra of the individual components.

(c) Performing a **T-weighted regression** is also readily
performed:

MeasuredAmp=([T
T] .* A)\(ObservedSpectrum .* T);where T is the transmission spectrum. Here, the matrix division backslash "\" is used as a short-cut to the classical least-squares matrix solution (c.f. http://www.mathworks.com/help/techdoc/ref/mldivide.html).

heights=cls(x, y, NumPeaks, PeakShape, Positions, Widths, extra)

where x and y are the vectors of measured data (e.g. x might be wavelength and y might be the absorbance at each wavelength), 'NumPeaks' is the number of peaks, 'PeakShape' is the peak shape number (1=Gaussian, 2=Lorentzian, 3=logistic, 4=Pearson, 5=exponentially broadened Gaussian; 6=equal-width Gaussians; 7=Equal-width Lorentzians; 8=exponentially broadened equal-width Gaussian, 9=exponential pulse, 10=sigmoid, 11=Fixed-width Gaussian, 12=Fixed-width Lorentzian; 13=Gaussian/Lorentzian blend; 14=BiGaussian, 15=BiLorentzian), 'Positions' is the vector of peak positions on the x axis (one entry per peak), 'Widths' is the vector of peak widths in x units (one entry per peak), and 'extra' is the additional shape parameter required by the exponentially broadened, Pearson, Gaussian/Lorentzian blend, BiGaussain and BiLorentzian shapes. Cls.m returns a vector of measured peak heights for each peak.

The demonstration script clsdemo.m (Graphic) creates some noisy model data, fits it with cls.m, computes the accuracy of the measured heights, then repeats the calculation

On the other hand, INLS can be better than CLS if there areFigure(1): Classical Least Squares (multilinear regression)

Elapsed time is 0.000937 seconds.

Average peak height accuracy = 0.9145%

Figure(2): Iterative non-linear peak fitting with peakfit.m

Elapsed time is 0.171765 seconds.

Average peak height accuracy = 1.6215%

The cls2.m function is similar to cls.m, except that it also measures the baseline (assumed to be flat), using extension to "background correction" described above, and returns a vector containing the background B and measured peak heights H for each peak , e.g. [B H1 H2 H3...].

Weighted linear regression: The downloadable Matlab/Octave function "tfit.m" simulates the measurement of the absorption spectrum of a mixture of three components by weighted linear regression (on line 61), demonstrates the effect of the amount of noise in the signal, the extent of overlap of the spectra, and the linearity of the analytical curves of each component. This function also compares the results to a more advanced method described later (line 66) that applies curve fitting to the

Note: you can right-click on any of the m-file links above and select Save Link As... to download them to your computer for use within Matlab.

Revised July, 2017. This page is part of "

Unique visits since May 17, 2008: