C/C++ code to extract MFCC or FBank features from wav files.
- Download following code from my GitHub and put these three directories under the same root directory.
- Switch the branch from master to masterCPlus in BasicAudioToolBox and MFCC. (The master branch is okay but it is highly recommended to use masterCPlus)
- For Linux, use
make
inMFCC
directory to construct the program. The target ismfcc
. - For Windows,
- Add these files to your visual studio.
- Add path of
BasicAudioToolBox
andinih
to VC++ catalog. - Add
_CRT_SECURE_NO_WARNINGS
to the preprocessor definition. - Enable
openmp
.(if you don't want to use multiple threads, ignore this step) - Generate the exe.
Run an example first. Enter MFCC
directory.
mfcc example/config.ini
The screen will display followings,
finish reading the config.ini
FFT passband 2 to 1024 out of 1 to 1024
Mel passband 15.986084 to 3923.357581
Thread ID 1
Convert example/a001_0_30.wav to example/a001_0_30.fbank
including :
MFCCNum 40
energyFlag 1
zeroCrossingFlag 1
brightFlag 1
subBandEFlag 8
the frame feature dimension is 408
Sample Rate : 44100
Number of channels : 2
Each sample's size in byte : 3
Each container's size in byte : 3
Number of samples : 1323001
total coef size: 612000
post-processing...
writing the doc...
These message indicts following steps to extract MFCC,
- The program will first read
config.ini
. - The Mel filter bands will be generated according to the setting in config.
- Read WAV file.
- Count and extract the features.
- Write the feature file.
You should prepare WAV files first(.wav), then a config file and a list. The following introduces the specific format.
The WAV should be PCM encoding and has a standard 44-byte head. (It's okay if there exists additional information chunk between 44-byte head and data chunk.) Don't worry about this, for most of WAV files are satisfied. If the wav is transformed from mp3 or other format, have a look at the head first.
You should check WAV if No new information displayed in the screen for a long time while running the program.
The config is read using inih
(https://github.com/benhoyt/inih). A standard config is example/config.ini
under MFCC
directory. The options listed should all be included. No promise for the program if you skip some options.
Website(https://github.com/hangtingchen/inih/blob/master/examples/test.ini) gives details about inih
.
This section sets options related to pre-process.
Key | Value | Notes |
---|---|---|
sampleRate | 8000/16000/44100/others | The sample rate should be set at first, which means the wav files should have same sample rate. |
lowpassfre | >0 && <hipassfre && <sampleRate | The min frequency |
hipassfre | >0 && >lowpassfre && <=sampleRate | The max frequency |
preemphasise | =0 no preemphasise; 0-1 preemphasise the signal | The coefficient of preemphasise |
zeroMeanSigFlag | 0/1 | whether to make input signal have a zero mean |
wlen | wlen=(wlenInTime(ms))*sampleRate/1000 | the number of samples of window |
inc | inc=(incInTime(ms))*sampleRate/1000 | the number of samples of window shift |
vecNum | =1(mono/double channel WAV);=2(double channel WAV);=4(double channel WAV) | the channels of the output feature |
This section controls the Mel filter settings.
Key | Value | Notes |
---|---|---|
fbankFlag | 0/1 | Extract MFCC or fbank |
bankNum | >0 | The number of Mel filters |
MFCCNum | >0 && <=bankNum | The number of MFCC; If fbankFlag=1, this option has no effect |
MFCC0thFlag | 0/1 | Whether to include MFCC0th; If fbankFlag=1, this option has no effect |
This section controls some other features.
Key | Value | Notes |
---|---|---|
energyFlag | 0/1 | Whether to include average energy |
zeroCrossingFlag | 0/1 | Whether to include average zero crossing rate |
brightFlag | 0/1 | Whether to include brightness |
subBandEFlag | =0, No subBand energy; >0 set the number of subband | |
fftLength | >=0 | Output fft. This is only for debugging. |
This section controls post-process.
Key | Value | Notes |
---|---|---|
znormFlag | 0/1 | Whether to do z-norm in each dimension within the single audio |
regreOrder | =1,no diff;=2 first order diff; and so on | (The degree of diff) + 1 |
delwin | >0 | The context length of diff; If regreOrder=1, this option has no effect |
This section controls how to read file and store features.
Key | Value | Notes |
---|---|---|
fileList | The position of list of files | |
saveType | f/e/n/b | csv(double)/csv(scientific)/npy(numpy)/binary |
numThreads | >0 | The number of threads.(Only when openmp is supported.) |
Each line in the list should include the source WAV and target feature, separated by tab
. Please refer to example/fileList.txt
.
- The main interface is
MFCCWapper
inmfcc.c
in case you want to extract feature in your own code or you don't want to control so many options. - The program has set up
hmath
andhsigProcess
according to HTK. - Recommend to use
masterCPlus
branch. - The anomaly detection is not perfect. Please pay attention to your config and file list.
- Though the code in
masterCPlus
is C++, the style is still C,and so is the memory application and release.