This respository is the official implementation of "End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus" for TASLP 2023.
-
If you just need the module only, run
pip install espnet
first, and you can use the modules in
abc_asr/model
. -
If you want to do full experiments, you need to correctly install ESPnet and kaldi first. See Installation.
Next, run
pip install -r requirements.txt
to install the required packages.
-
Download dataset.
Download the ABCS Corpus here: Links.
Download the noisy air conducted data (
ns_air_data.zip
) here: [Onedrive] or [Baidu Cloud]Unzip the noisy data into ABCS's directory:
unzip -d <ABCS dir>/Audio/ ns_air_data.zip
-
Execute the data preparation script.
For inference only:
python3 data_prep --dataset_root <ABCS dir> --test
For full experiments:
python3 data_prep --dataset_root <ABCS dir>
-
Ensure that kaldi and ESPnet are properly installed on your environment. Next, have correctly adjust the third line in
test.sh
:export ESPNETROOT=<Your Espnet Root>
-
Download the model parameters file here [Onedrive] or [Baidu Cloud]
mv model.acc.best <Your Path>/abc_asr/results
-
Run
bash test.sh
SNR=-5dB | SNR=0dB | SNR=5dB | SNR=10dB | SNR=15dB | SNR=20dB | Clean | |
---|---|---|---|---|---|---|---|
The proposed MMT | 17.5 | 14.9 | 11.8 | 9.4 | 7.9 | 7.1 | 6.7 |
The training pipeline.
If you found this code helpful, please consider citing it as follows:
@ARTICLE{9961873,
author={Wang, Mou and Chen, Junqi and Zhang, Xiao-Lei and Rahardja, Susanto},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus},
year={2023},
volume={31},
number={},
pages={513-524},
keywords={Speech recognition;Speech processing;Signal to noise ratio;Spectrogram;Headphones;Microphones;Synchronization;Speech recognition;multi-modal speech processing;bone conduction;air- and bone-conducted speech corpus},
doi={10.1109/TASLP.2022.3224305}}