Based on Nita
Ghosh's Scientific
data
analysis with Python: Part 1
A "sliding average" smooth is
simple to implement using the "mean" function in both languages.
import numpy as np
import matplotlib.pyplot as plt
from pytictoc import TicToc
t = TicToc()
plt.rcParams["font.size"] = 16
plt.rcParams['figure.figsize'] = (20, 10)
sigRate = 1000 #Hz
time = np.arange(0,3, 1/sigRate)
n = len(time)
p = 30 #poles for random interpolation
ampl = np.interp(np.linspace(0,p,n),np.arange(0,p),np.random.rand(p)*30)
noiseamp = 5
noise = noiseamp*np.random.randn(n)
signal = ampl + noise
t.tic() # start clock
plt.plot(time, ampl)
plt.plot(time, signal)
filtSig = np.zeros(n)
k = 50
for i in range(k,n-k-1):
filtSig[i] = np.mean(signal[i-k:i+k])
plt.plot(time, filtSig)
t.toc() # stop clock print time
# Save signal to file
np.savetxt('SlidingAverageSignal.out',(time,
signal),
delimiter=',')
In the first several lines, Python must import some packages that are required for numerical computing, such as numpy and matplotlib. Matlab has much of that already built-in but also has optional add-on "toolboxes" for special purposes. The program creates a simulated noisy signal in the first block of code and then uses the "savetext" function to save it to disc for Matlab to read, ensuring that the exact same signal is used for both:
clear
clf
load SlidingAverageSignal.out
x=SlidingAverageSignal(1,:);
y=SlidingAverageSignal(2,:);
tic % start clock
lx=length(x);
filtsig=zeros(1,lx);
k=50; % smooth width
for i=k:lx-k-1
filtsig(i)=mean(y(i-k+1:i +k));
end
plot(x,y,'g',x,filtsig,'linewidth',1.5)
toc % stop clock and print elapsed time
The fundamental thing here is
the use of the "mean" function in both languages to compute the
mean of k adjacent points for the smoothed signal. In Python this is done
by the lines:
k = 50
for i in range(k,n-k-1):
filtSig[i] = np.mean(signal[i-k:i+k])
and in Matlab by:
k=50;
for i=k:lx-k-1
filtsig(i)=mean(y(i-k+1:i +k));
end
You can see how similar they
are. (I have used the same variable names to make the comparison
easier). A key difference is how a block of code is specified,
such as a loop or a function definition. In Python, this is done
using indentation, which is rigorously enforced.
In Matlab, it is done using "end" statements and indentation is
optional (but is easily done by the "Smart Indent" in the
right-click menu). Python uses square brackets to enclose the
index to arrays rather that parentheses as does Matlab. Another
important difference: Python arrays are indexed from zero;
Matlab's are indexed from 1. Thus, the first two elements of an
array A in Python are A[0] and A[1] but in Matlab they are A(1)
and A(2). Also, Python uses # to indicate
comment lines; Matlab uses %.
To compare the execution times,
I have used the "tic" and "toc" statements (inbuilt in Matlab
but added to Python by the "TicToc" package) to start and stop a
timer. I have bold-faced those lines in the code, to make it
clear that I am counting and timing only the "inner” portion of
the code that will have to be repeated if you have multiple data
set to process. I don’t count the time required by the initial
set-up, loading of packages, etc. – things that need be done
only once.
Python: 8 lines; elapsed time 0.04
- 0.08 sec
Matlab: 7 lines: elapsed time 0.007 -
.008 sec
The results show that the
programs are about the same length but that Matlab is a bit
faster in this case.
The Fourier transform (FT) is fundamental for
computing frequency spectra, convolution, and deconvolution. The
brief codes here implement Fourier convolution: they create a
long vector named "a" consisting of random numbers, compute the
FT, multiply it by itself, and then reverse FT the result, both
using the fft and ifft functions. As before, tic and toc
mark the timed blocks. (Fourier deconvolution would be
similar except that the two Fourier transforms would be divided
rather than multiplied).
Python
import numpy as np
from scipy import fft
from pytictoc import TicToc
t = TicToc()
t.tic() # start clock
min_len = 93059 # prime length is worst case for speed
a = np.random.randn(min_len)
b = fft.fft(a)
c=b*b
d=fft.ifft(c)
t.toc() # stop
clock and print elapsed time
tic
MinLen = 93059; % prime length is worst case for speed
a=randn(1,MinLen);
b = fft.fft(a)
c=b.*b;
d=ifft(c);
toc
Python:
import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import inv
from pytictoc import TicToc
t = TicToc()
t.tic()
# Define a Gaussian peak function
def gaussian(x, center, amplitude, width):
# This generates a single peak with a
gaussian lineshape,
# with a specific center, amplitude, and
width.
# width is the FWHM (full width at half
maximum)
return
amplitude*np.exp(-((x-center)/(0.60056120439323*width))**2)
# Create a set of spectra G representing the components of the
mixture
NumPoints=1000
X=np.linspace(200,800,NumPoints)
center1=450
G1=gaussian(X,center1,1,100)
center2=500
G2=gaussian(X,center2,1,120)
center3=550
G3=gaussian(X,center3,1,130)
E=np.c_[G1,G2,G3] # compose the separate G vectors into a matrix
E
noiseamp=.01
# Specify the amplitudes of the components in the mixture
amp1=1 # Amplitude (height) of peak 1
amp2=.5 # Amplitude (height) of peak 2
amp3=.3 # Amplitude (height) of peak 2
# Compute the simulated noisy mixture spectrum
A=amp1*G1+amp2*G2+amp3*G3+noiseamp*np.random.randn(len(X))
t.tic() # Start clock for execution timing
# Apply normal equation to compute concentration vector C
C = inv(E.T.dot(E)).dot(E.T).dot(A)
plt.plot(X,A)
plt.plot(X,C[0]*G1)
plt.plot(X,C[1]*G2)
plt.plot(X,C[2]*G3)
PercentDifference1=100*(amp1-C[0])/amp1
PercentDifference2=100*(amp2-C[1])/amp2
PercentDifference3=100*(amp3-C[2])/amp3
plt.text(center1,C[0],PercentDifference1, fontsize=22)
plt.text(center2,C[1],PercentDifference2, fontsize=22)
plt.text(center3,C[2],PercentDifference3, fontsize=22)
plt.title('Python: Percent error of peak heights measured by
Classical Least Squares')
t.toc() # stop clock and print elapsed time
print(C) # Print measured amplitudes
Matlab:
NumPoints=1000;
X=linspace(200,800,NumPoints)'; % Creates NumPoints x 1 vector X
center1=450;
G1=gaussian(X,center1,1,100); % Gaussian function defined in
lines 38-43
center2=500;
G2=gaussian(X,center2,1,110);
center3=550;
G3=gaussian(X,center3,1,120);
E=[G1,G2,G3]; % compose the separate G vectors into a matrix E
noiseamp=.01;
% Specify the amplitudes of the components in the mixture
amp1=1; % Amplitude (height) of peak 1
amp2=.5; % Amplitude (height) of peak 2
amp3=.3; % Amplitude (height) of peak 2
% Compute the simulated noisy mixture spectrum
A=amp1.*G1+amp2.*G2+amp3.*G3+noiseamp.*randn(length(X),1) ;
tic % Start clock for execution timing
% Apply normal equation to compute concentration vector C
C = inv(E'*E)*E'*A;
plot(X,A,X,C(1).*G1,X,C(2).*G2,X,C(3).*G3)
PercentDifference1=100.*(amp1-C(1))/amp1;
PercentDifference2=100*(amp2-C(2))/amp2;
PercentDifference3=100*(amp3-C(3))/amp2;
text(center1,C(1),num2str(PercentDifference1),'FontSize',14);
text(center2,C(2),num2str(PercentDifference2),'FontSize',14);
text(center3,C(3),num2str(PercentDifference3),'FontSize',14);
title('Matlab: Percent errors of peak heights measured by
Classical Least Squares')
toc % stop clock and print elapsed time
disp(C) % Print measured amplitudes
% Define a Gaussian peak function
function g=gaussian(x,center,amplitude,width)
% This generates a single peak with a gaussian
lineshape,
% with a specific center, amplitude, and width.
% width is the FWHM (full width at half maximum)
g=amplitude*exp(-((x-center)/(0.60056120439323*width)).^2);
end
Note: The bulk of the code in both cases is
devoted to creating a simulated signal and plotting. The actual
computation of concentrations by CLS takes only one line
in both languages: C =
inv(E.T.dot(E)).dot(E.T).dot(A) in
Python and C = inv(E'*E)*E'*A; in Matlab.
Python: 50 lines; 0.017 sec.
Matlab: 41 lines. 0.018 sec.
The
automatic detection of peaks is a common requirement. The codes here are based
on the Python example in the SciPy documentation:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html.
Both languages have a "find peaks" function, but Matlab's is
rather slow, so I prefer to code my own algorithm using
nested loops (each with a required "end" statement, but also
indented for clarity).
The signal used here is a portion of an electrocardiogram that
is included as a sample signal in the scipy package for Python
(and saved by Python for Matlab's use). The codes detect the
peaks that exceed a specified amplitude threshold and mark
them with a small red "x".
Python
import numpy as np
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
from pytictoc import TicToc
t = TicToc()
x = electrocardiogram()[2000:4000]
t.tic() # start clock
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()
t.toc() # stop clock and print elapsed time
Matlab
load electrocardiogram.out;
y=electrocardiogram;
tic % start clock
plot(1:length(y),y,'-k')
height=0;
x=1:length(y);
peak=0;
for k = 2:length(x)-1
if y(k) > y(k-1)
if y(k) > y(k+1)
if y(k)>height
peak = peak + 1;
P(peak,:)=[x(k) y(k)];
end
end
end
end
hold on
for k=1:length(P)
text(P(k,1)-12,P(k,2),'x','color',[1 0 0])
end
grid
hold off
toc % stop clock and print elapsed time
Python: 5 lines; elapsed time 0.01 - 0.011 sec
Matlab: 19 lines: elapsed time 0.007 -
0.0071 sec as written
(1.5 sec using the Matlab inbuilt findpeaks.m function)
(based on https://chrisostrouchov.com/post/peak_fit_xrd_python/)
In both Python and Matlab, iterative curve fitting
is more complex that the previous examples. Both languages
have optimization functions ("optimize.minimize" in scipy
package for Python ; the inbuilt "fminsearch"
function in Matlab). The Python optimize function is more
flexible and can use several
different
optimization methods. Matlab uses the Nelder-Mead
method, but there is an optional optimization
toolbox that provides additional methods. For this
comparison I'm using the default optimization method for both
cases.
import math
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
from pytictoc import TicToc
t = TicToc()
def g(x, A, μ, σ):
return A / (σ * math.sqrt(2 * math.pi)) * np.exp(-(x-μ)**2 / (2*σ**2))
def f(x):
return np.exp(-(x-2)**2) + np.exp(-(x-6)**2/10) + 1/(x**2 + 1)
A = 100.0 # intensity
μ = 4.0 # mean
σ = 4.0 # peak width
n = 500 # Number of data points in signal
x = np.linspace(-10, 10, n)
y = g(x, A, μ, σ) + np.random.randn(n)
t.tic() # start clock
def cost(parameters):
g_0 = parameters[:3]
g_1 = parameters[3:6]
return np.sum(np.power(g(x, *g_0) + g(x, *g_1) - y, 2)) / len(x)
initial_guess = [5, 10, 4, -5, 10, 4]
result = optimize.minimize(cost, initial_guess)
g_0 = [250.0, 4.0, 5.0]
g_1 = [20.0, -5.0, 1.0]
x = np.linspace(-10, 10, n)
y = g(x, *g_0) + g(x, *g_1) + np.random.randn(n)
fig, ax = plt.subplots()
ax.scatter(x, y, s=1)
ax.plot(x, g(x, *g_0))
ax.plot(x, g(x, *g_1))
ax.plot(x, g(x,*g_0) + g(x,*g_1))
ax.plot(x, y)
t.toc() # stop clock and print elapsed time
np.savetxt('SavedFromPython.out', (x,y), delimiter=',') # Save signal to file
Matlab
clear
clf
load SavedFromPython.out
x=SavedFromPython(1,:);
y=SavedFromPython(2,:);
start=[-5 3 6 3];
format compact
global PEAKHEIGHTS
tic % Start clock
FitResults=fminsearch(@(lambda)(fitfunction(lambda,x,y)),start);
NumPeaks=round(length(start)./2);
for m=1:NumPeaks
A(m,:)=shapefunction(x,FitResults(2*m-1),FitResults(2*m));
end
model=PEAKHEIGHTS'*A;
plot(x,y,'- r',x,model)
hold on;
for m=1:NumPeaks
plot(x,A(m,:)*PEAKHEIGHTS(m));
end
hold off
toc % stop clock and print elapsed time
function err = fitfunction(lambda,t,y)
global PEAKHEIGHTS
A = zeros(length(t),round(length(lambda)/2));
for j = 1:length(lambda)/2
A(:,j) = shapefunction(t,lambda(2*j-1),lambda(2*j))';
end
PEAKHEIGHTS = A\y';
z = A*PEAKHEIGHTS;
err = norm(z-y');
end
function g = shapefunction(x,a,b)
g = exp(-((x-a)./(0.6005612.*b)) .^2); % Expression for peak shape
end
Python: 16 lines; elapsed
time 0.02 - 0.08 sec
Matlab: 25 lines; elapsed
time 0.01 - 0.03 sec
Created September, 2021. Updated January, 2022.