Skip to content

Guided Tour: Signal processing

Olivier Lartillot edited this page Feb 10, 2018 · 11 revisions

This is a part of the MiningSuite Guided Tour.

Here are examples of operations that can be performed using the SigMinr module in the MiningSuite. In this tutorial, we will be using mostly audio files as examples, but these experiments can be carried out on any kind of signal.

Basic signal processing operators

sig.signal

You can load various type of audio files, for instance MP3:

sig.signal('beethoven.mp3')

You can also load other types of signals, for instance delimited text files or spreadsheet files:

sig.signal('insurance.csv')

Only numerical fields are taken into consideration.

Various options are available, as detailed in the sig.signal online documentation.

sig.spectrum

To see all the frequencies contained in your signal, use sig.spectrum:

sig.spectrum('test.wav')

You can select a particular range of frequencies:

sig.spectrum('test.wav','Min',10,'Max',1000)

To compute the power spectrogram:

sig.spectrum('test.wav','Frame','Power')

To learn more about the maths behind this transformation, and all the possible options, check the sig.spectrum online documentation.

sig.frame

The analysis of a whole temporal signal leads to a global description of the average value of the feature under study. In order to take into account the dynamic evolution of the feature, the analysis has to be carried out on a short-term window that moves chronologically along the temporal signal. Each position of the window is called a frame. For instance:

f = sig.frame('test.wav','FrameSize',1,'Hop',0.5)

Then we can perform any computation on each of the successive frame easily. For instance, the computation of the spectrum for each successive frame, can be written as:

sig.spectrum(f,'Max',1000)

What you see in the progressive evolution of the spectrum over time, frame by frame. This is called a spectrogram.

More simply, you can compute the same thing by writing just one command:

sig.spectrum('test.wav','Max',1000,'Frame')

Here the frame size was chosen by default. You can of course specify the frame size yourself:

sig.spectrum('test.wav','Max',1000,'Frame','FrameSize',1,'Hop',0.5)

For more information about sig.frame, click on the link.

sig.flux

Once we have computed the spectrogram

s = sig.spectrum('test.wav','Frame')

we can evaluate how fast the signal changes from one frame to the next one by computing the spectral flux:

sig.flux(s)

In the resulting curve, you see peaks that indicate particular moments where the spectrum has changed a lot from one frame to the next one. In other words, those peaks indicate that something new has appeared at that particular moment.

You can compute spectral flux directly using the simple command:

sig.flux('test.wav')

The flux can be computed from other representations that the spectrum. For more information about sig.flux, click on the link.

sig.rms

You can get an average of the amplitude of the signal by computing its Root Mean Square (or RMS):

sig.rms('test.wav')

But this gives a single numerical value, which is not very informative (although you can compared different signals in this way).

You can also see the temporal evolution of this averaged amplitude by computing RMS frame by frame:

sig.rms('test.wav','Frame')

We have now seen two ways to detect new events in the signal: sig.flux considers the changes in the spectrum while sig.rms considers the contrasts in the amplitude of the signal.

For more information about sig.rms, click on the link.

sig.envelope

From a signal can be computed the envelope, which shows the global outer shape of the signal.

e = sig.envelope('test.wav')

It is particularly useful in order to show the long term evolution of the signal, and has application in particular to the detection of events. So it is very closely related to sig.rms that we just saw.

You can listen to the envelope itself. This is played by a simple noise that follows exactly the same envelope.

sig.play(e)

sig.envelope can be estimated using a large range of techniques, and with a lot of parameters that can be tuned. For more information, click on the link.

sig.peaks

For any kind of representation (curve, spectrogram, etc.) you can easily find the peaks showing the local maxima by calling sig.peaks. For instance, on the envelope we just computed:

sig.peaks(e)

You can specify that you just want the highest peak:

sig.peaks(e,'Total',1)

set some threshold, for instance selecting all peaks higher than half the maximum value:

sig.peaks(e,'Threshold',.5)

You can see that by default sig.peaks selects some peaks in a kind of adaptive way. You can turn off this adaptive peak picking by toggling off the 'Contrast' option:

sig.peaks(e,'Contrast',0)

For more information about sig.peaks, click on the link.

sig.autocor

We saw that we can find all the frequencies in a signal by computing sig.spectrum. But it is actually focused on finding sinusoids. More generally, if you want to find any kind of periodicities in a signal, for instance our envelope we just computed, we can compute an autocorrelation function by using sig.autocor:

sig.autocor(e)

Each peak in the autocorrelation function indicates periodicities. But these periodicities are expressed as lags, which corresponds to the duration of one period. If you want to see instead periodicities as frequencies (similar to sig.spectrum), use the 'Freq' option:

sig.autocor(e,'Freq')

And again you can specify the range of frequencies. For instance:

sig.autocor(e,'Freq','Max',10,'Hz')

For more information about sig.autocor, click on the link.

sig.filterbank

Something more technical, not necessarily useful for you, but of interest for experts in signal processing. Here is an example of more complex operation that can be performed: the decomposition of the signal into different channels corresponding to different frequency regions:

f = sig.filterbank('test.wav','CutOff',[-Inf,1000,5000])

This decomposes the initial audio file into 2 channels based on a bank of 2 filters: one low-pass with CutOff 500 Hz, and one high-pass with CutOff 500 Hz.

You can play each channel separately:

sig.play(f)

You can then compute any operation on each channel separately, for instance:

e = sig.envelope(f)

And you can finally sum back all the channels together:

e = sig.sum(e)

For more information about sig.filterbank, click on the link.

Data importation

You can import in the MiningSuite any data you have already computed in Matlab. For instance let's say we generate an array using this Matlab command:

c = rand(100,1)

Then we can import this array as values of sig.signal. Here you need to know that sig.signal actually outputs a Matlab object of class sig.Signal. So to create your own object, use the sig.Signal method:

sig.Signal(c)

You can specify the sampling rate:

sig.Signal(c,'Srate',100)

There is no proper documentation of those classes for the moment.

Statistics

From a given computation in the MiningSuite, for instance:

s = sig.spectrum('test.wav')

we can compute various statistics:

  • the average: sig.mean(s)
  • the standard deviation: sig.std(s)
  • the histogram: sig.histogram(s)
  • distribution moments: sig.centroid(s), sig.spread(s), sig.skewness(s), sig.kurtosis(s)
  • other description of the flatness of the distribution: sig.flatness(s), sig.entropy(s)

More information about all these operators in the complete documentation

Structural analysis

sig.simatrix

From a spectrogram:

s = sig.spectrum('test.wav','Frame')

we can compute a self-similarity matrix that shows the structure of the audio recording:

sig.simatrix(s)

We can compute the same representation from other input analyses, such as MFCC:

m = aud.mfcc('george.wav','Frame')
sig.simatrix(m)

The structural analysis depends highly on the choice of frame length and hop factor for the frame decomposition:

m = aud.mfcc('george.wav','Frame','FrameLength',1,'Hop',.5)
sig.simatrix(m)

For more information about sig.simatrix, click on the link.

sig.novelty

Let's consider again our self-similarity matrix:

s = sig.spectrum('test.wav','Frame')
sm = sig.simatrix(s)

We can see in the matrix a succession of blocs along the diagonals, indicating successive homogenous sections. We can automatically detect the temporal positions of those change of sections by computing a novelty curve:

n = sig.novelty(sm)
sig.peaks(n)

For more information about sig.novelty, click on the link.

sig.segment

The initial audio recording can then be segmented based on those segmentation points we have detected as peaks of the novelty curve:

n = sig.novelty(sm)
p = sig.peaks(n)
sg = sig.segment('test.wav',p)

We can listen to each successive segment separately:

sig.play(sg)

If we perform any of the MiningSuite operations on this segmented signal, the operations will be performed on each successive segment separately:

s = aud.mfcc(sg)

For more information about sig.segment, click on the link.

Study further

Each operator accepts a set of options. There is a large list of operators available in SigMinr. Check the list of operators here and learn more about each operator by clicking on the links.

SigMinr is dedicated solely to general-purpose signal processing. The AudMinr package includes specializations of some of the operators (such as envelope, spectrum, etc.) to audio and auditory modeling. Check the Guided Tour related to audio and auditory modeling.

You can go back to the MiningSuite Guided Tour.

Clone this wiki locally