Multiple Linear Regression of Spectral Data

Computer Methods in Chemical Engineering


Problem Statement: Develop a regression model to find sample composition from its color (spectral intensities at a series of wavelengths) in a two-component system. The data look like the following.

    Mole         Total
  Fraction   Concentration  -----Spectral Intensity-----
                  (M)
      y<1>         y<2>         x<1>    x<2>          x<30>
  ------------------------------------------------------
     0.75        3.5230      1.6600  1.5830  ...  2.5220
      :           :           :       :            :
      :           :           :       :            :
Here is one such set of experimental data (nadh.dat) for you to work with. The first column of the data file contains the mole fraction, the second column the total concentration, and columns three and beyond the spectral intensity at a series of wavelengths. Thus, the dependent variables are the sample composition, i.e., mole fraction and total concentration;
  Y = [y<1> y<2>]
and the independent variables are the sample color, i.e., spectral intensities at thirty wavelengths.
  X = [x<1> x<2> ... x<30>]
Before you start, you may want to center each variable around the mean value and rescale with the standard deviation so that the new scaled variables are roughly of order 1, i.e., y<1>~[-1, 1]. Because the independent variables are correlated, the scalar product matrix of the various vectors x<j>, i.e., the matrix XTX, is singular (or nearly singular), and naive regression based the following normal equation does not work. You will have trouble evaluating (XTX)-1.
  Naive regression:  mole fraction y<1> = a11*x<1> + a21*x<2> + a31*x<3> + ... + a30,1*x<30> + error<1>
  Naive regression:  total conc.   y<2> = a12*x<1> + a22*x<2> + a32*x<3> + ... + a30,2*x<30> + error<2>
  Regress Y against X:                     Y=X*a+error
  Normal equation provides the solution:   a=(XTX)-1*XT*Y
Find the eigenvalues for the square matrix XTX and list them in decreasing order. Also find the associated normalized eigenvectors, v<1>, v<2>, v<3>.... How many independent eigenvectors are there? Show that all of these eigenvectors are mutually orthogonal (i.e., v<i>T*v<i>=1 for i=j, vT*v=I). It is better to describe each sample in terms of values (scores) along these mutually orthogonal eigenvectors (loadings) rather than x<i>.
  score<i>=X*v<i>
In other words, we employ a new coordinate system constructed out of eigenvectors.
  mole fraction y<1> = a11*score<1> + a21*score<2> + a31*score<3> + ...
  total conc.   y<2> = a12*score<1> + a22*score<2> + a32*score<3> + ...
Find the coefficients. How many terms to you need to describe adequately mole fraction and total concentration? Note that ai1*score<i> is the projection of the vector y<1> onto the vector score<i>.
  Projection of y<1> onto score<i> = ai1*score<i> = (y<1>,score<i>)/(score<i>,score<i>)*score<i>
Thus, the coefficient ai1 is
  ai1 = (y<1>,score<i>)/(score<i>,score<i>)
  ai1 = (y<1>,score<i>)  if score<i> is normalized, i.e., (score<i>,score<i>)=1
Finally, provide the regression equation y(spectral intensities) to predict composition (mole fraction and total concentration) from color.

Solution:


Return to Prof. Nam Sun Wang's Home Page
Return to Computer Methods in Chemical Engineering (ENCH250)

Computer Methods in Chemical Engineering -- Regression of Spectral Data
Forward comments to:
Nam Sun Wang
Department of Chemical & Biomolecular Engineering
University of Maryland
College Park, MD 20742-2111
301-405-1910 (voice)
301-314-9126 (FAX)
e-mail: nsw@umd.edu ©1996-2006 by Nam Sun Wang
UMCP logo