AppendixA

Appendix X: Batch processing

In situations where you have a large volume of similar types of data to process, it's useful to automate the process. Let's assume that you have already acquired data in the form of a large number of text or numerical data files of some standardized format that are stored in a known directory (folder) somewhere on your computer. For example they might be ASCII .txt or .csv files with the independent variable ('x') in the first column and one or more dependent variables ('y') in the other columns. There may be a variable number of data files, and their file names and length may be unknown and variable, but the data format is consistent from file to file. You could write a Matlab script or function that will process those files one-by-one, but you want the computer to go through all the data files in that directory automatically, determine their file names, load each into the variable workspace, apply the desired processing operations (peak detection, deconvolution, curve fitting, whatever), and then collect all the resulting terminal window output, each labeled with the file name, add the results to a growing "diary" file, and then go on to the next data file. Ideally, the program should not stop if it encounters any kind of fatal error with one of the data files; rather, it should just skip that one and go on to the next.

BatchProcess.m is a Matlab/Octave example of just such an automated process that you can use as a framework for your applications. You need only change three things to make this work for you:

(a) the directory name where the data are stored on your computer - (DataDirectory) in line 11;
(b) the directory name where the Matlab signal processing functions are stored on your computer
- (FunctionsDirectory) in line 12; and
(c) the actual processing functions that you wish to apply to each file (which in this example perform
peak fitting using the "peakfit.m" function in lines 34 - 41, but could be anything).

When it starts, the routine opens a "diary" file in line 21, located in the FunctionsDirectory, with the file name "BatchProcess<date>.txt" (where <date> is the current date, e.g. 12-Jun-2017). This file captures all the terminal window output during processing - in this example, I am using the peakfit.m function that generates a FitResults matrix (with Peak#, Position, Height, Width, and Area of the best-fit model), and the percent fitting error and R2 value ,for each data file in that directory. (Subsequent runs of the program on the same date are appended to this dairy file. On each subsequent day, a new file is begun for that day). You can also optionally save some of the variables in the workspace to data files; add a "save" function after the processing and before the "catch me" statement (type "help save" at the command prompt for options).

This program uses a couple of coding techniques that are especially useful in automated file processing. It uses the "function forms" of the commands "ls" (line 13), "diary" (line 21), and "load" (line 29) to allow then to accept variables computed within the program. It also uses the "try/catch/end" structure (lines 28, 47, 49), which prevents the program from stopping if it encounters an error on one of the data files; rather, it reports the error for that file and skips to the next one. Here's the general form of that structure:

TRY
   % The statements between 'TRY' and 'CATCH ME' will be
   % attempted and skipped if they generate an error.
CATCH ME
   % If an error is generated, the statements between
   % 'CATCH ME' and 'END' will be executed instead.
END

After running this script, the "BatchProcess..." diary file will contain all the terminal output. Here's an except from a typical diary file, in which the first two data files in the directory yielded errors that were caught and skipped by the "try/catch/end" structure , but the third one ("2016-08-05-RSCT-2144.txt") and all the following ones worked normally and reported the results of the peak fitting operations:

Error with file number 1.Error with file number 2.3: 2016-08-05-RSCT-2144.txt Peak# Position Height Width Area 1 6594.2 0.1711 0.74403 0.13551 2 6595.1 0.16178 0.60463 0.1041 % fitting error R2 2.5735 0.99483 4: 2016-09-05-RSCT-2146.txt Peak# Position Height Width Area 1 6594.7 0.11078 1.4432 0.17017 2 6595.6 0.04243 0.38252 0.01727 % fitting error R2 4.5342 0.98182 5: 2016-09-09-RSCT-2146.txt Peak# Position Height Width Area 2 6594 0.05366 0.5515 0.0315 1 6594.9 0.1068 1.2622 0.1435 % fitting error R2 3.709 0.98743
6: .... etc....

You could also optionally import the dairy file into Excel by opening an Excel worksheet, click on a cell, click Data > From Text, select the diary file, click to specify that spaces are to be used as column separators, and click Import. This will put all the collected terminal output into that spreadsheet. Additionally you might want to save the workspace variables (e.g. as a .mat file).

This page is part of "A Pragmatic Introduction to Signal Processing", created and maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu. Updated July, 2022.