index previous

Python, a free general-purpose language with many numerical computing extensions.

[Signal smoothing]  [Fourier transform]  [Classical Least Squares]  [Peak detection]  [Iterative curve fitting]

A popular alternative to Matlab for scientific programming is Python, which is a free and open-source language, whereas Matlab is closed and proprietary. These two languages are different in many ways, and an experienced Matlab programmer might have some difficulty converting to Python, and vice versa. But the fact is that anything that I have done on this site using Matlab, you can do in Python. I do not intend to provide a complete introduction to Python, any more than I did for Matlab or for spreadsheets. There are many good online sources, for example a quick reference table for switching from Matlab to Python, Scilab, Gnuplot, or R. Rather, I will review a few of the crucial computational methods in signal processing, showing how to do the calculations and plotting in Python and compare the code side-by-side to the equivalent calculation in Matlab. In addition, I will compare the execution of both versions, running on the same desktop computer system, to see if there is any advantage one way or the other in execution speed or code size.

For this comparison I am running Python 3.8.8; Anaconda Individual Edition 2020.11, using the included Spyder 5.0.5 desktop (graphic), and Matlab 2021: 9.10.0.1602886 (R2021a). Both are running on a Dell XPS i7 3.5Ghz tower running Windows 10, and both have have an integrated editor, code analysis, debugging, ability to edit variables, etc. You will probably find that Matlab is easier to install than Python/Anaconda/Spyder, but that's a minor one-time advantage.

Sliding average signal smoothing

Based on  Nita Ghosh's Scientific data analysis with Python: Part 1

A "sliding average" smooth is simple to implement using the "mean" function in both languages.

 

Python

import numpy as np

import matplotlib.pyplot as plt

from pytictoc import TicToc

t = TicToc()

plt.rcParams["font.size"] = 16

plt.rcParams['figure.figsize'] = (20, 10)

sigRate = 1000 #Hz

time = np.arange(0,3, 1/sigRate)

n = len(time)

p = 30 #poles for random interpolation

ampl = np.interp(np.linspace(0,p,n),np.arange(0,p),np.random.rand(p)*30)

noiseamp = 5

noise = noiseamp*np.random.randn(n)

signal = ampl + noise

 

t.tic() # start clock

plt.plot(time, ampl)

plt.plot(time, signal)

filtSig = np.zeros(n)

k = 50

for i in range(k,n-k-1):

    filtSig[i] = np.mean(signal[i-k:i+k])

plt.plot(time, filtSig)   

t.toc()  # stop clock print time

 

# Save signal to file

np.savetxt('SlidingAverageSignal.out',(time, signal), delimiter=',')

 

In the first several lines, Python must import some packages that are required for numerical computing, such as numpy and matplotlib. Matlab has much of that already built-in but also has optional add-on "toolboxes" for special purposes. The program creates a simulated noisy signal in the first block of code and then uses the "savetext" function to save it to disc for Matlab to read, ensuring that the exact same signal is used for both:


Matlab

clear

clf

load SlidingAverageSignal.out

x=SlidingAverageSignal(1,:);

y=SlidingAverageSignal(2,:);

 

tic % start clock

lx=length(x);

filtsig=zeros(1,lx);

k=50; % smooth width

for i=k:lx-k-1

    filtsig(i)=mean(y(i-k+1:i +k));

end

plot(x,y,'g',x,filtsig,'linewidth',1.5)

toc % stop clock and print elapsed time

 

 

The fundamental thing here is the use of the "mean" function in both languages to compute the mean of k adjacent points for the smoothed signal.  In Python this is done by the lines:

 

k = 50

for i in range(k,n-k-1):

    filtSig[i] = np.mean(signal[i-k:i+k])

 

and in Matlab by:

 

k=50;

for i=k:lx-k-1

    filtsig(i)=mean(y(i-k+1:i +k));

end

 

You can see how similar they are. (I have used the same variable names to make the comparison easier). A key difference is how a block of code is specified, such as a loop or a function definition. In Python, this is done using indentation, which is rigorously enforced. In Matlab, it is done using "end" statements and indentation is optional (but is easily done by the "Smart Indent" in the right-click menu). Python uses square brackets to enclose the index to arrays rather that parentheses as does Matlab. Another important difference: Python arrays are indexed from zero; Matlab's are indexed from 1. Thus, the first two elements of an array A in Python are A[0] and A[1] but in Matlab they are A(1) and A(2). Also, Python uses # to indicate comment lines; Matlab uses %.

To compare the execution times, I have used the "tic" and "toc" statements (inbuilt in Matlab but added to Python by the "TicToc" package) to start and stop a timer. I have bold-faced those lines in the code, to make it clear that I am counting and timing only the "inner” portion of the code that will have to be repeated if you have multiple data set to process. I don’t count the time required by the initial set-up, loading of packages, etc. – things that need be done only once. 

 

 

Python: 8 lines; elapsed time 0.04  -  0.08 sec

Matlab: 7 lines: elapsed time  0.007 - .008 sec

The results show that the programs are about the same length but that Matlab is a bit faster in this case.


Fourier transform and (de)convolution

The Fourier transform (FT) is fundamental for computing frequency spectra, convolution, and deconvolution. The brief codes here implement Fourier convolution: they create a long vector named "a" consisting of random numbers, compute the FT, multiply it by itself, and then reverse FT the result, both using the fft and ifft functions. As before, tic and toc mark the timed blocks. (Fourier deconvolution would be similar except that the two Fourier transforms would be divided rather than multiplied).

 

Python

 

import numpy as np

from scipy import fft

from pytictoc import TicToc

t = TicToc()

 

t.tic() # start clock

min_len = 93059  # prime length is worst case for speed

a = np.random.randn(min_len)

b = fft.fft(a)

c=b*b

d=fft.ifft(c)

t.toc()  # stop clock and print elapsed time

Matlab

tic

MinLen = 93059; % prime length is worst case for speed

a=randn(1,MinLen);

b = fft.fft(a)

c=b.*b;

d=ifft(c);

toc


Python: 5 lines; elapsed time  0.01  - 0.04 sec (The execution time apparently varies with different random number sequences).
Matlab: 5 lines:elapsed time   0.008  -  0.009 sec 

Classical Least Squares

The Classical Least squares (CLS) technique has long been used in spectroscopic analysis of mixtures, where the spectra of the individual components are known but which overlap in mixtures. A comparison of Python and Matlab for this method is instructive. (I wrote the Matlab code first, copied and pasted it into the Spyder editor, and converted it line by line). After the required importation of packages the python packages, the codes NormalEquationDemo.py (graphic) and NormalEquationDemo.m (graphic), are remarkably similar; both use the same sequence of operations and the same variable names to create and plot a noisy simulated signal vector A that is the sum of three peaks of known shape but unknown amplitude, and then to measure the amplitudes C in the noisy signal by Classical Least Squares. The percent error between the true and measured peak heights are printed next to each peak. The code similarities are striking, differing mainly in the use of indentation, the way functions are defined, the way arrays are indexed, the way vectors are concatenated into matrixes, and the coding of matrix transpose, exponentiation and dot products. The elapsed times are essentially the same.
 

Python:


import numpy as np
import matplotlib.pyplot as plt
from numpy.linalg import inv
from pytictoc import TicToc
t = TicToc()
t.tic()

# Define a Gaussian peak function
def gaussian(x, center, amplitude, width):
    # This generates a single peak with a gaussian lineshape,
    # with a specific center, amplitude, and width.
    # width is the FWHM (full width at half maximum)
    return amplitude*np.exp(-((x-center)/(0.60056120439323*width))**2)

# Create a set of spectra G representing the components of the mixture
NumPoints=1000
X=np.linspace(200,800,NumPoints)
center1=450
G1=gaussian(X,center1,1,100)
center2=500
G2=gaussian(X,center2,1,120)
center3=550
G3=gaussian(X,center3,1,130)

E=np.c_[G1,G2,G3] # compose the separate G vectors into a matrix E
noiseamp=.01
# Specify the amplitudes of the components in the mixture
amp1=1 # Amplitude (height) of peak 1
amp2=.5 # Amplitude (height) of peak 2
amp3=.3 # Amplitude (height) of peak 2
# Compute the simulated noisy mixture spectrum
A=amp1*G1+amp2*G2+amp3*G3+noiseamp*np.random.randn(len(X)) 
 
t.tic() # Start clock for execution timing
# Apply normal equation to compute concentration vector C
C = inv(E.T.dot(E)).dot(E.T).dot(A)

plt.plot(X,A)
plt.plot(X,C[0]*G1)
plt.plot(X,C[1]*G2)
plt.plot(X,C[2]*G3)
PercentDifference1=100*(amp1-C[0])/amp1
PercentDifference2=100*(amp2-C[1])/amp2
PercentDifference3=100*(amp3-C[2])/amp3
plt.text(center1,C[0],PercentDifference1, fontsize=22)
plt.text(center2,C[1],PercentDifference2, fontsize=22)
plt.text(center3,C[2],PercentDifference3, fontsize=22)
plt.title('Python: Percent error of peak heights measured by Classical Least Squares')
t.toc()  # stop clock and print elapsed time
 
print(C) # Print measured amplitudes


Matlab:

NumPoints=1000;
X=linspace(200,800,NumPoints)'; % Creates NumPoints x 1 vector X
center1=450;
G1=gaussian(X,center1,1,100); % Gaussian function defined in lines 38-43
center2=500;
G2=gaussian(X,center2,1,110);
center3=550;
G3=gaussian(X,center3,1,120);

E=[G1,G2,G3]; % compose the separate G vectors into a matrix E
noiseamp=.01;
% Specify the amplitudes of the components in the mixture
amp1=1; % Amplitude (height) of peak 1
amp2=.5; % Amplitude (height) of peak 2
amp3=.3; % Amplitude (height) of peak 2
% Compute the simulated noisy mixture spectrum
A=amp1.*G1+amp2.*G2+amp3.*G3+noiseamp.*randn(length(X),1) ;

tic % Start clock for execution timing
% Apply normal equation to compute concentration vector C
C = inv(E'*E)*E'*A;

plot(X,A,X,C(1).*G1,X,C(2).*G2,X,C(3).*G3)
PercentDifference1=100.*(amp1-C(1))/amp1;
PercentDifference2=100*(amp2-C(2))/amp2;
PercentDifference3=100*(amp3-C(3))/amp2;
text(center1,C(1),num2str(PercentDifference1),'FontSize',14);
text(center2,C(2),num2str(PercentDifference2),'FontSize',14);
text(center3,C(3),num2str(PercentDifference3),'FontSize',14);
title('Matlab: Percent errors of peak heights measured by Classical Least Squares')
toc % stop clock and print elapsed time

disp(C) % Print measured amplitudes

% Define a Gaussian peak function
function g=gaussian(x,center,amplitude,width)
   % This generates a single peak with a gaussian lineshape,
   % with a specific center, amplitude, and width.
   % width is the FWHM (full width at half maximum)
   g=amplitude*exp(-((x-center)/(0.60056120439323*width)).^2);
end

Note: The bulk of the code in both cases is devoted to creating a simulated signal and plotting. The actual computation of concentrations by CLS takes only one line in both languages: C = inv(E.T.dot(E)).dot(E.T).dot(A) in Python and C = inv(E'*E)*E'*A; in Matlab.

Python: 50 lines; 0.017 sec. 
Matlab: 41 lines. 0.018 sec.



Peak Detection

(based on https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html)

The automatic detection of peaks is a common requirement. The codes here are based on the Python example in the SciPy documentation:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html. Both languages have a "find peaks" function, but Matlab's is rather slow, so I prefer to code my own algorithm using nested loops (each with a required "end" statement, but also indented for clarity).


The signal used here is a portion of an electrocardiogram that is included as a sample signal in the scipy package for Python (and saved by Python for Matlab's use). The codes detect the peaks that exceed a specified amplitude threshold and mark them with a small red "x".

 

Python

import numpy as np

import matplotlib.pyplot as plt

from scipy.misc import electrocardiogram

from scipy.signal import find_peaks

from pytictoc import TicToc

t = TicToc()

x = electrocardiogram()[2000:4000]

 

t.tic() # start clock

peaks, _ = find_peaks(x, height=0)

plt.plot(x)

plt.plot(peaks, x[peaks], "x")

plt.plot(np.zeros_like(x), "--", color="gray")

plt.show()

t.toc()  # stop clock and print elapsed time

 

Matlab
load electrocardiogram.out;

y=electrocardiogram;

tic % start clock

  plot(1:length(y),y,'-k')

  height=0;

  x=1:length(y);

  peak=0;

  for k = 2:length(x)-1

      if y(k) > y(k-1)

          if y(k) > y(k+1)

              if y(k)>height

                  peak = peak + 1;

                  P(peak,:)=[x(k) y(k)];

              end

          end

      end

  end

  hold on

  for k=1:length(P)

      text(P(k,1)-12,P(k,2),'x','color',[1 0 0])

  end

  grid

  hold off

toc % stop clock and print elapsed time

 

Python: 5 lines; elapsed time  0.01  -  0.011 sec

Matlab: 19 lines: elapsed time  0.007 -  0.0071 sec as written

(1.5 sec using the Matlab inbuilt findpeaks.m function)


Iterative least-squares fitting

(based on https://chrisostrouchov.com/post/peak_fit_xrd_python/)

 

In both Python and Matlab, iterative curve fitting is more complex that the previous examples. Both languages have optimization functions ("optimize.minimize" in scipy package for Python ; the inbuilt "fminsearch" function in Matlab). The Python optimize function is more flexible and can use several different optimization methods. Matlab uses the Nelder-Mead method, but there is an optional optimization toolbox that provides additional methods. For this comparison I'm using the default optimization method for both cases.


Python

import math

import numpy as np

import matplotlib.pyplot as plt

from scipy import optimize

from pytictoc import TicToc

t = TicToc()

def g(x, A, μ, σ):

    return A / (σ * math.sqrt(2 * math.pi)) * np.exp(-(x-μ)**2 / (2*σ**2))

def f(x):

    return np.exp(-(x-2)**2) + np.exp(-(x-6)**2/10) + 1/(x**2 + 1)

A = 100.0 # intensity

μ = 4.0  # mean

σ = 4.0 # peak width

n = 500 # Number of data points in signal

x = np.linspace(-10, 10, n)

y = g(x, A, μ, σ) + np.random.randn(n) 

 

t.tic() # start clock

def cost(parameters):

    g_0 = parameters[:3]

    g_1 = parameters[3:6]

    return np.sum(np.power(g(x, *g_0) + g(x, *g_1) - y, 2)) / len(x)

initial_guess = [5, 10, 4, -5, 10, 4]

result = optimize.minimize(cost, initial_guess)

g_0 = [250.0, 4.0, 5.0]

g_1 = [20.0, -5.0, 1.0]

x = np.linspace(-10, 10, n)

y = g(x, *g_0) + g(x, *g_1) + np.random.randn(n)

fig, ax = plt.subplots()

ax.scatter(x, y, s=1)

ax.plot(x, g(x, *g_0))

ax.plot(x, g(x, *g_1))

ax.plot(x, g(x,*g_0) + g(x,*g_1))

ax.plot(x, y)

t.toc()  # stop clock and print elapsed time

 

np.savetxt('SavedFromPython.out', (x,y), delimiter=',') # Save signal to file

 

 

Matlab

clear

clf

load SavedFromPython.out

x=SavedFromPython(1,:);

y=SavedFromPython(2,:);

start=[-5 3 6 3];

format compact

global PEAKHEIGHTS

 

tic % Start clock

FitResults=fminsearch(@(lambda)(fitfunction(lambda,x,y)),start);

NumPeaks=round(length(start)./2);

for m=1:NumPeaks

  A(m,:)=shapefunction(x,FitResults(2*m-1),FitResults(2*m));

end

model=PEAKHEIGHTS'*A;

plot(x,y,'- r',x,model)

hold on;

for m=1:NumPeaks

    plot(x,A(m,:)*PEAKHEIGHTS(m));

end

hold off

toc % stop clock and print elapsed time

 

function err = fitfunction(lambda,t,y)

global PEAKHEIGHTS

A = zeros(length(t),round(length(lambda)/2));

  for j = 1:length(lambda)/2

     A(:,j) = shapefunction(t,lambda(2*j-1),lambda(2*j))';

  end

PEAKHEIGHTS = A\y';

z = A*PEAKHEIGHTS;

err = norm(z-y');

end

 

function g = shapefunction(x,a,b)

g = exp(-((x-a)./(0.6005612.*b)) .^2); % Expression for peak shape

end

 

Python: 16 lines; elapsed time  0.02 - 0.08 sec

Matlab: 25 lines; elapsed time  0.01 - 0.03 sec

 

The Matlab algorithm here is the same as used by my my peakfit.m and interactive ipf.m functions. Elapsed times of both Matlab and Python vary with the random noise sample and, despite the rather different methods used, are quite similar for this unconstrained two-Gaussian fit.

An excellent step-by-step introduction to iterative curve fitting in Python is by Emily Grace Ripka.

The bottom line is that Matlab is slightly faster for most operations, at least it is on my hardware (your mileage may vary), but the fact that Python is free and open-source is a strong argument in its favor. And Python is certainly faster than the free Matlab clone Octave. I would recommend this: if
you have a background in computer science or have taken some courses, then Python is probably the better choice . Others might find Matlab easier to approach. I like both languages, but I have mainly used Matlab, because it acts and feels more finished and polished than either Python/Spyder or Octave, in my opinion. But what do I know? I was trained as a chemist, not a programmer.

 
For further reading, see https://realpython.com/matlab-vs-python/#further-reading


This page is part of "A Pragmatic Introduction to Signal Processing", created and maintained by Prof. Tom O'Haver , Department of Chemistry and Biochemistry, The University of Maryland at College Park. Comments, suggestions and questions should be directed to Prof. O'Haver at toh@umd.edu.

Created September, 2021. Updated January, 2022.