Simplified Python Audio Features Extraction
spafe aims to simplify feature extractions from mono audio files. Spafe includes various computations related to filter banks, spectrograms, frequencies and cepstral features . The library has the following structure:
- Bark filter banks
- Gammatone filter banks
- Linear filter banks
- Mel filter banks
- Bark spectrogram
- CQT spectrogram
- Erb spectrogram
- Mel spectrogram
- Bark Frequency Cepstral Coefficients (BFCCs)
- Constant Q-transform Cepstral Coefficients (CQCCs)
- Gammatone Frequency Cepstral Coefficients (GFCCs)
- Linear Frequency Cepstral Coefficients (LFCCs)
- Linear Prediction Components (LPCs)
- Mel Frequency Cepstral Coefficients (MFCCs)
- Inverse Mel Frequency Cepstral Coefficients (IMFCCs)
- Magnitude based Spectral Root Cepstral Coefficients (MSRCCs)
- Normalized Gammachirp Cepstral Coefficients (NGCCs)
- Power-Normalized Cepstral Coefficients (PNCCs)
- Phase based Spectral Root Cepstral Coefficients (PSRCCs)
- Perceptual Linear Prediction Coefficents (PLPs)
- Rasta Perceptual Linear Prediction Coefficents (RPLPs)
The theory behind features computed using spafe can be summmarized in the following graph:
- Dominant frequencies
- Fundamental frequencies
spafe requires:
if you want to use the visualization module/ functions of spafe, you will need to install:
- Matplotlib (>= 3.5.2)
Once you have the Dependencies installed, use one of the following install options.
- To freshly install spafe:
pip install spafe
- To update an existing installation:
pip install -U spafe
- Spafe is also available on anaconda:
conda install spafe
- You can build spafe from source, by following:
git clone [email protected]:SuperKogito/spafe.git
cd spafe
python setup.py install
Unlike most existing audio feature extraction libraries (python_speech_features, SpeechPy, surfboard and Bob), Spafe provides more options for spectral features extraction algorithms, notably:
- Bark Frequency Cepstral Coefficients (BFCCs)
- Constant Q-transform Cepstral Coefficients (CQCCs)
- Gammatone Frequency Cepstral Coefficients (GFCCs)
- Power-Normalized Cepstral Coefficients (PNCCs)
- Phase based Spectral Root Cepstral Coefficients (PSRCCs)
Most existing libraries and to their credits provide great implementations for features extraction but are unfortunately limited to the Mel Frequency Features (MFCC) and at best have Bark frequency and linear predictive coefficients additionally. Librosa for example includes great implementation of various algorithms (only MFCC and LPC are included), based on the Short Time Fourrier Transform (STFT), which is theoretically more accurate but slower than the Discret Fourrier Transform used in Spafe's implementation.
Various examples on how to use spafe are present in the documentation https://superkogito.github.io/spafe.
<!> Please make sure you are referring to the correct documentation version.
Contributions are welcome and encouraged. To learn more about how to contribute to spafe please refer to the Contributing guidelines
-
If you want to cite spafe as a software, please cite the version used as indexed in Zenodo:
Ayoub Malek, Hadrien Titeux, Stefano Borzì, Christian Heider Nielsen, Fabian-Robert Stöter, Hervé BREDIN, & Kevin Mattheus Moerman. (2023). SuperKogito/spafe: v0.3.2 (v0.3.2). Zenodo. https://doi.org/10.5281/zenodo.7686438
-
You can also site spafe's paper as follows:
Malek, A., (2023). Spafe: Simplified python audio features extraction. Journal of Open Source Software, 8(81), 4739, https://doi.org/10.21105/joss.04739