Based on Nita
Ghosh's Scientific
data
analysis with Python: Part 1

A "sliding average" smooth is
simple to implement using the "mean" function in both languages.

**Python**

`import numpy as np`

`import matplotlib.pyplot as
plt`

`from pytictoc import TicToc`

`t = TicToc()`

`plt.rcParams["font.size"] =
16`

`plt.rcParams['figure.figsize']
=
(20, 10)`

`sigRate = 1000 #Hz`

`time = np.arange(0,3,
1/sigRate) `

`n = len(time)`

`p = 30 #poles for random
interpolation`

`ampl =
np.interp(np.linspace(0,p,n),np.arange(0,p),np.random.rand(p)*30)`

`noiseamp = 5`

`noise =
noiseamp*np.random.randn(n)`

`signal = ampl + noise`

` `

`t.tic() # start clock`

`plt.plot(time, ampl)`

`plt.plot(time, signal)`

`filtSig = np.zeros(n)`

`k = 50`

`for i in range(k,n-k-1):`

`
filtSig[i] = np.mean(signal[i-k:i+k])`

`plt.plot(time,
filtSig) `

`t.toc() # stop clock
print time`

` `

`# Save signal to file`

`np.savetxt('SlidingAverageSignal.out',(time,
signal),
delimiter=',') `

In the first several lines,
Python must import some packages that are required for numerical
computing, such as numpy and matplotlib.
Matlab has much of that already built-in but also has optional
add-on "toolboxes"
for special purposes. The program creates a simulated noisy
signal in the first block of code and then uses the "savetext"
function to save it to disc for Matlab to read, ensuring that
the exact same signal is used for both:

** **

**Matlab**

`clear`

`clf`

`load SlidingAverageSignal.out`

`x=SlidingAverageSignal(1,:);`

`y=SlidingAverageSignal(2,:);`

` `

`tic % start clock`

`lx=length(x);`

`filtsig=zeros(1,lx);`

`k=50; % smooth width`

`for i=k:lx-k-1`

` filtsig(i)=mean(y(i-k+1:i +k));`

`end`

`plot(x,y,'g',x,filtsig,'linewidth',1.5)`

`toc % stop clock and print elapsed time`

The fundamental thing here is
the use of the "mean" function in both languages to compute the
mean of *k* adjacent points for the smoothed signal. In Python this is done
by the lines:

`k = 50`

`for i in range(k,n-k-1):`

` ``filtSig[i]
= np.mean(signal[i-k:i+k])`

` `

and in Matlab by:

` `

`k=50;`

`for`` i=k:lx-k-1`

` ``filtsig(i)=mean(y(i-k+1:i
+k));`

`end`

You can see how similar they
are. (I have used the same variable names to make the comparison
easier). A key difference is how a block of code is specified,
such as a loop or a function definition. In Python, this is done
using *indentation*, which is *rigorously enforced*.
In Matlab, it is done using "end" statements and indentation is
optional (but is easily done by the "Smart Indent" in the
right-click menu). Python uses square brackets to enclose the
index to arrays rather that parentheses as does Matlab. Another
important difference: Python arrays are indexed from zero;
Matlab's are indexed from 1. Thus, the first two elements of an
array A in Python are A[0] and A[1] but in Matlab they are A(1)
and A(2). Also, Python uses # to indicate
comment lines; Matlab uses %.

To compare the execution times,
I have used the "tic" and "toc" statements (inbuilt in Matlab
but added to Python by the "TicToc" package) to start and stop a
timer. I have bold-faced those lines in the code, to make it
clear that I am counting and timing only the "inner” portion of
the code that will have to be repeated if you have multiple data
set to process. I don’t count the time required by the initial
set-up, loading of packages, etc. – things that need be done
only once.

**Python: 8 lines; elapsed time 0.04
- 0.08 sec**

**Matlab: 7 lines: ****elapsed time **0.007 - .008
sec

The results show that the
programs are about the same length but that Matlab is a bit
faster in this case.

**The Fourier transform (FT) is fundamental for
computing frequency spectra, convolution, and deconvolution. The
brief codes here implement Fourier convolution: they create a
long vector named "a" consisting of random numbers, compute the
FT, multiply it by itself, and then reverse FT the result, both
using the fft and ifft functions. As before, ****tic** and **toc**
mark the timed blocks. (Fourier *deconvolution *would be
similar except that the two Fourier transforms would be *divided
*rather than * multiplied*).

**Python**

`import numpy as np`

`from scipy import fft`

`from pytictoc import TicToc`

`t = TicToc()`

` `

`t.tic() # start clock`

`min_len = 93059`` ``#
prime length is worst case for speed`

`a = np.random.randn(min_len)`

`b = fft.fft(a)`

`c=b*b`

`d=fft.ifft(c)`

`t.toc()`` ``# stop
clock and print elapsed time`

`tic`

`MinLen = 93059; % prime length is worst case for
speed`

`a=randn(1,MinLen);`

`b = fft.fft(a)`

`c=b.*b;`

`d=ifft(c);`

`toc`

**Python**:

import numpy as np

import matplotlib.pyplot as plt

from numpy.linalg import inv

from pytictoc import TicToc

t = TicToc()

**t.tic()**

# Define a Gaussian peak function

def gaussian(x, center, amplitude, width):

# This generates a single peak with a
gaussian lineshape,

# with a specific center, amplitude, and
width.

# width is the FWHM (full width at half
maximum)

return
amplitude*np.exp(-((x-center)/(0.60056120439323*width))**2)

# Create a set of spectra G representing the components of the
mixture

NumPoints=1000

X=np.linspace(200,800,NumPoints)

center1=450

G1=gaussian(X,center1,1,100)

center2=500

G2=gaussian(X,center2,1,120)

center3=550

G3=gaussian(X,center3,1,130)

E=np.c_[G1,G2,G3] # compose the separate G vectors into a matrix
E

noiseamp=.01

# Specify the amplitudes of the components in the mixture

amp1=1 # Amplitude (height) of peak 1

amp2=.5 # Amplitude (height) of peak 2

amp3=.3 # Amplitude (height) of peak 2

# Compute the simulated noisy mixture spectrum

A=amp1*G1+amp2*G2+amp3*G3+noiseamp*np.random.randn(len(X))

t.tic() # Start clock for execution timing

# Apply normal equation to compute concentration vector C

C = inv(E.T.dot(E)).dot(E.T).dot(A)

plt.plot(X,A)

plt.plot(X,C[0]*G1)

plt.plot(X,C[1]*G2)

plt.plot(X,C[2]*G3)

PercentDifference1=100*(amp1-C[0])/amp1

PercentDifference2=100*(amp2-C[1])/amp2

PercentDifference3=100*(amp3-C[2])/amp3

plt.text(center1,C[0],PercentDifference1, fontsize=22)

plt.text(center2,C[1],PercentDifference2, fontsize=22)

plt.text(center3,C[2],PercentDifference3, fontsize=22)

plt.title('Python: Percent error of peak heights measured by
Classical Least Squares')

**t.toc() # stop clock and print elapsed time**

print(C) # Print measured amplitudes

**Matlab****:**

`NumPoints=1000;
X=linspace(200,800,NumPoints)'; % Creates NumPoints x 1 vector X
center1=450;
G1=gaussian(X,center1,1,100); % Gaussian function defined in
lines 38-43
center2=500;
G2=gaussian(X,center2,1,110);
center3=550;
G3=gaussian(X,center3,1,120);
E=[G1,G2,G3]; % compose the separate G vectors into a matrix E
noiseamp=.01;
% Specify the amplitudes of the components in the mixture
amp1=1; % Amplitude (height) of peak 1
amp2=.5; % Amplitude (height) of peak 2
amp3=.3; % Amplitude (height) of peak 2
% Compute the simulated noisy mixture spectrum
A=amp1.*G1+amp2.*G2+amp3.*G3+noiseamp.*randn(length(X),1) ;
`Note: The bulk of the code in both cases is
devoted to creating a simulated signal and plotting. The actual
computation of concentrations by CLS takes

**Python: 50 lines; 0.017 sec.
Matlab: 41 lines. 0.018 sec.**

**
**

The
automatic detection of peaks is a common requirement. The codes are based on the
Python example in the SciPy documentation:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html.
Both languages have a "find peaks" function, but Matlab's is
rather slow, so I prefer to code my own algorithm using
nested loops (each with a required "end" statement, but also
indented for clarity).

The signal used here is a portion of an electrocardiogram that
is included as a sample signal in the scipy package for Python
(and saved by Python for Matlab's use). The codes detect the
peaks that exceed a specified amplitude threshold and mark
them with a small red "x".

**Python**

`import numpy as np`

`import matplotlib.pyplot as
plt`

`from scipy.misc import
electrocardiogram`

`from scipy.signal import
find_peaks`

`from pytictoc import TicToc`

`t = TicToc()`

`x =
electrocardiogram()[2000:4000]`

` `

`t.tic() # start clock`

`peaks, _ = find_peaks(x,
height=0)`

`plt.plot(x)`

`plt.plot(peaks, x[peaks],
"x")`

`plt.plot(np.zeros_like(x),
"--", color="gray")`

`plt.show()`

`t.toc() # stop
clock and print elapsed time`

` `

Matlab

`load electrocardiogram.out;`` `

`y=electrocardiogram;`

`tic % start clock`

` plot(1:length(y),y,'-k')`

` height=0;`

` x=1:length(y);`

` peak=0;`

` for k = 2:length(x)-1`

` if y(k) > y(k-1)`

`
if y(k) > y(k+1)`

`
if y(k)>height`

`
peak = peak + 1;`

`
P(peak,:)=[x(k) y(k)];`

`
end`

`
end`

` end`

` end`

` hold on`

` for k=1:length(P)`

`
text(P(k,1)-12,P(k,2),'x','color',[1 0 0])`

` end`

` grid`

` hold off`

`toc % stop clock and print elapsed time`

Python: 5 lines; elapsed time 0.01 - 0.011 sec

Matlab: 19 lines: elapsed time 0.007 -
0.0071 sec as written

(1.5 sec using the Matlab inbuilt findpeaks.m function)

**(based on https://chrisostrouchov.com/post/peak_fit_xrd_python/)**

**In both Python and Matlab, iterative curve fitting
is more complex that the previous examples. Both languages
have optimization functions ("optimize.minimize" in scipy
package for Python ; the inbuilt "fminsearch"
in Matlab). The Python optimize function is more flexible and
can use several
different
optimization methods. Matlab uses the Nelder-Mead
method, but there is an optional optimization
toolbox that provides additional methods. For this
comparison I'm using the default optimization method for both
cases.**

Python

`import math`

`import numpy as np`

`import matplotlib.pyplot as
plt`

`from scipy import optimize`

`from pytictoc import TicToc`

`t = TicToc()`

`def g(x, A, μ, σ):`

` ``return
A / (σ * math.sqrt(2 * math.pi)) * np.exp(-(x-μ)**2 /
(2*σ**2))`

`def f(x):`

` ``return
np.exp(-(x-2)**2) + np.exp(-(x-6)**2/10) + 1/(x**2 + 1)`

`A = 100.0 # intensity`

`μ = 4.0`` ``#
mean`

`σ = 4.0 # peak width`

`n = 500 # Number of data
points in signal`

`x = np.linspace(-10, 10, n)`

`y = g(x, A, μ, σ) +
np.random.randn(n)`` `

` `

`t.tic() # start clock`

`def cost(parameters):`

` ``g_0
= parameters[:3]`

` ``g_1
= parameters[3:6]`

` ``return
np.sum(np.power(g(x, *g_0) + g(x, *g_1) - y, 2)) / len(x)`

`initial_guess = [5, 10, 4,
-5, 10, 4]`

`result =
optimize.minimize(cost, initial_guess)`

`g_0 = [250.0, 4.0, 5.0]`

`g_1 = [20.0, -5.0, 1.0]`

`x = np.linspace(-10, 10, n)`

`y = g(x, *g_0) + g(x, *g_1)
+ np.random.randn(n)`

`fig, ax = plt.subplots()`

`ax.scatter(x, y, s=1)`

`ax.plot(x, g(x, *g_0))`

`ax.plot(x, g(x, *g_1))`

`ax.plot(x, g(x,*g_0) +
g(x,*g_1))`

`ax.plot(x, y)`

`t.toc()`` ``#
stop clock and print elapsed time`

` `

`np.savetxt('SavedFromPython.out',
(x,y),
delimiter=',') # Save signal to file`

` `

` `

Matlab

`clear`

`clf`

`load SavedFromPython.out`

`x=SavedFromPython(1,:);`

`y=SavedFromPython(2,:);`

`start=[-5 3 6 3];`

`format compact`

`global PEAKHEIGHTS`

` `

`tic % Start clock
`

`FitResults=fminsearch(@(lambda)(fitfunction(lambda,x,y)),start);`

`NumPeaks=round(length(start)./2);`

`for m=1:NumPeaks`

` ``A(m,:)=shapefunction(x,FitResults(2*m-1),FitResults(2*m));`

`end`

`model=PEAKHEIGHTS'*A;`

`plot(x,y,'- r',x,model)`

`hold on;`

`for m=1:NumPeaks`

` ``plot(x,A(m,:)*PEAKHEIGHTS(m));`

`end`

`hold off`

`toc % stop clock and
print elapsed time`

` `

`function err =
fitfunction(lambda,t,y)`

`global PEAKHEIGHTS`

`A =
zeros(length(t),round(length(lambda)/2));`

` ``for j =
1:length(lambda)/2`

` `` `` ``
``A(:,j) = shapefunction(t,lambda(2*j-1),lambda(2*j))';`

` ``end`

`PEAKHEIGHTS = A\y';`

`z = A*PEAKHEIGHTS;`

`err = norm(z-y');`

`end`

` `

`function g =
shapefunction(x,a,b)`

`g =
exp(-((x-a)./(0.6005612.*b)) .^2); % Expression for peak shape`

`end`

Python: 16 lines; elapsed
time 0.02 - 0.08 sec

Matlab: 25 lines; elapsed
time 0.01 - 0.03 sec

An excellent step-by-step introduction to iterative curve fitting in Python is by Emily Grace Ripka.

This page is part of "

Created September, 2021. Updated January, 2022.