Problem Statement: Develop a regression model to find sample composition from its color (spectral intensities at a series of wavelengths) in a two-component system. The data look like the following.
Mole Total
Fraction Concentration -----Spectral Intensity-----
(M)
y<1> y<2> x<1> x<2> x<30>
------------------------------------------------------
0.75 3.5230 1.6600 1.5830 ... 2.5220
: : : : :
: : : : :
Here is one such set of experimental data (nadh.dat) for you to work with.
The first column of the data file contains the mole fraction, the
second column the total concentration, and columns three and
beyond the spectral intensity at a series of wavelengths. Thus,
the dependent variables are the sample composition, i.e., mole
fraction and total concentration;
Y = [y<1> y<2>]and the independent variables are the sample color, i.e., spectral intensities at thirty wavelengths.
X = [x<1> x<2> ... x<30>]Before you start, you may want to center each variable around the mean value and rescale with the standard deviation so that the new scaled variables are roughly of order 1, i.e., y<1>~[-1, 1]. Because the independent variables are correlated, the scalar product matrix of the various vectors x<j>, i.e., the matrix XTX, is singular (or nearly singular), and naive regression based the following normal equation does not work. You will have trouble evaluating (XTX)-1.
Naive regression: mole fraction y<1> = a11*x<1> + a21*x<2> + a31*x<3> + ... + a30,1*x<30> + error<1> Naive regression: total conc. y<2> = a12*x<1> + a22*x<2> + a32*x<3> + ... + a30,2*x<30> + error<2> Regress Y against X: Y=X*a+error Normal equation provides the solution: a=(XTX)-1*XT*YFind the eigenvalues for the square matrix XTX and list them in decreasing order. Also find the associated normalized eigenvectors, v<1>, v<2>, v<3>.... How many independent eigenvectors are there? Show that all of these eigenvectors are mutually orthogonal (i.e., v<i>T*v<i>=1 for i=j, vT*v=I). It is better to describe each sample in terms of values (scores) along these mutually orthogonal eigenvectors (loadings) rather than x<i>.
score<i>=X*v<i>In other words, we employ a new coordinate system constructed out of eigenvectors.
mole fraction y<1> = a11*score<1> + a21*score<2> + a31*score<3> + ... total conc. y<2> = a12*score<1> + a22*score<2> + a32*score<3> + ...Find the coefficients. How many terms to you need to describe adequately mole fraction and total concentration? Note that ai1*score<i> is the projection of the vector y<1> onto the vector score<i>.
Projection of y<1> onto score<i> = ai1*score<i> = (y<1>,score<i>)/(score<i>,score<i>)*score<i>Thus, the coefficient ai1 is
ai1 = (y<1>,score<i>)/(score<i>,score<i>) ai1 = (y<1>,score<i>) if score<i> is normalized, i.e., (score<i>,score<i>)=1Finally, provide the regression equation y(spectral intensities) to predict composition (mole fraction and total concentration) from color.
Solution:
|