AppendixA

Appendix M: Curve fitting in spreadsheets and stand-alone programs.

Both Excel and OpenOffice Calc have a "Solver" capability that will change specified cells in an attempt to produce a specified goal; this can be used in peak fitting to minimize the fitting error between a set of data and a proposed calculated model, such as a set of overlapping Gaussian bands. (This external source, https://www.wallstreetmojo.com/solver-in-excel/, has a detailed graphical explanation of using the Solver.) Solver includes three different solving methods. This Excel spreadsheet example (screen shot) demonstrates how this is used to fit four Gaussian components to a sample set of x,y data that has already been entered into columns A and B, rows 22 to 101 (you could type or paste in your own data there).

After entering the data, do a visual estimate of how many Gaussian peaks it might take to represent the data, and their locations and widths, and type those values into the 'Proposed model' table. The spreadsheet calculates the best-fit values for the peak heights (by multilinear regression) in the 'Calculated amplitudes' table and plots the data and the fit. It also plots the "residuals", which are the point-by-point differences between the data and the model; ideally the residuals would be zero, or at least small. (Adjust the x-axis scale of these graphs to fit your data). The next step is to use Solver function to "fine-tune" the position and width of each component to minimize the % fitting error (in red) and to make the residuals plot as random as possible: click Data in the top menu bar, click Solver (upper right) to open the Solver box, into which you type "C12" into "Set Objective", click "min", select the cells in the "Proposed Model" that you want to optimize, add any desired constraints in the "Subject to the Constraints" box, and click the Solve button. The position, width, and amplitude of all the components are automatically optimized by Solver and best fit is displayed. (You can see that the Solver has changed the selected entries in the proposed model table, reduced the fitting error (cell C12, in red), and made the residuals smaller and more random). If the fit fails, change the starting values, click Solver, and click the Solve button. You can automate the above process and reduce it to a single function-key press by using macros, as described in Appendix N.

So, how many Gaussian components does it take to fit the data? One way to tell is to look at the plot of the residuals (which shows the point-by-point difference between the data and the fitted model), and add components until the residuals are random, not wavy, but this works only if the data are not smoothed before fitting. Here's an example - a set of real data that are fit with an increasing sequence of two Gaussians, three Gaussians, four Gaussians, and five Gaussians. As you look at this sequence of screenshots, you'll see the percent fitting error decrease, the R² value become closer to 1.000, and the residuals become smaller and more random. (Note that in the 5-component fit, the first and last components are not peaks within the 250-600 x range of the data, but rather account for the background). There is no need to try a 6-component fit because the residuals are already random at 5 components and more components than that would just "fit the noise" and would likely be unstable and give a very different result with another sample of that signal with different noise.

There are a number of downloadable non-linear iterative curve fitting adds-ons and macros for Excel and OpenOffice. For example, Dr. Roger Nix of Queen Mary University of London has developed a very nice Excel/VBA spreadsheet for curve fitting X-ray photoelectron spectroscopy (XPS) data, but it could be used to fit other types of spectroscopic data. A 4-page instruction sheet is also provided.

The Python language has many options for iterative least-squares; one is is compared to its Matlab equivalent.

There are also many examples of stand-alone freeware and commercial programs, including PeakFit, Data Master 2003, MyCurveFit, Curve Expert, Origin, ndcurvemaster, and the R language.

If you use a spreadsheet for this type of curve fitting, you have to build a custom spreadsheet for each problem, with the right number of rows for the data and with the desired number of components. For example, my CurveFitter.xlsx template is only for a 100-point signal and a 5-component Gaussian model. It's easy to extend to a larger number of data points by inserting rows between 22 and 100, columns A through N, drag-copying the formulas down into the new cells (e.g. CurveFitter2.xlsx is extended to 256 points). To handle other numbers of components or model shapes you would have to insert or delete columns between C and G and between Q and U and edit the formulas, as has been done in this set of templates for 2 Gaussians, 3 Gaussians, 4 Gaussians, 5 Gaussians, and 6 Gaussians.

If your peaks are superimposed on a baseline, you can include a model for the baseline as one of the components. For instance, if you wish to fit 2 Gaussian peaks on a linear tilted slope baseline, select a 3-component spreadsheet template and change one of the Gaussian components to the equation for a straight line (y=mx+b, where m is the slope and b is the intercept). A template for that particular case is CurveFitter2GaussianBaseline.xlsx (graphic); don't click "Make Unconstrained Variables Non-Negative" in this case, because the baseline model may well need negative variables, as it does in this particular example. If you want to use another peak shape or another baseline shape, you'd have to modify the equation in row 22 of the corresponding columns C through G and drag-copy the modified cell down to the last row, as was done to change the Gaussian peak shape into a Lorentzian shape in CurveFitter6Lorentzian.xlsx. Or you could make columns C through G contain equations for different peak or baseline shapes. For exponentially broadened Gaussian peak shapes, you can use CurveFitter2ExpGaussianTemplate.xlsx or two overlapping peaks (screen graphic). In this case, each peak has four parameters: height, position, width, and lambda (which determines the asymmetry - the extent of exponential broadening).

In some cases, it is useful to add constraints to the variables determined by iteration, for example to constrain them to be greater than zero, or constrained between two limits, or equal to each other, etc. Doing so will force the solutions to adhere to known expectations and avoid nonphysical solutions. This is especially important for complex shapes such as the exponentially broadened Gaussian just discussed in the previous paragraph. You can do this by adding those constraints using the "Subject to the Constraints:" box in the center of the "Solver Parameters" box (see the graphic above). For details, see https://www.solver.com/excel-solver-add-change-or-delete-constraint?

The point is that you can do - in fact, you must do - a lot of custom editing to get a spreadsheet template that fits your data. In contrast, my Matlab/Octave peakfit.m function automatically adapts to any number of data points and is easily set to over 40 different model peak shapes and any number of peaks simply by changing the input arguments. Using my Interactive Peak Fitter function ipf.m in Matlab, you can press a single keystroke to instantly change the peak shape, number of peaks, baseline mode, or to re-calculate the fit with different start or with a bootstrap subset of the data. That's far quicker and easier than the spreadsheet. But on the other hand, a real advantage of spreadsheets in this application is that it is relatively easy to add your own custom shape functions and constraints, even complicated ones, using standard spreadsheet cell formula construction. And if you are hiring help, it's probably easier to find an experienced spreadsheet programmer than a Matlab programmer. So, if you are not sure which to use, my advice is to try both methods and decide for yourself.

This page is part of "A Pragmatic Introduction to Signal Processing", created and maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu. Updated July, 2022.