GitHub - aaaceo890/abc_asr: An official implementation of "End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus" for TASLP 2023.

Multi-modal Speech Recognition for ABCS Corpus

This respository is the official implementation of "End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus" for TASLP 2023.

Installation

If you just need the module only, run
```
pip install espnet
```
first, and you can use the modules in abc_asr/model.
If you want to do full experiments, you need to correctly install ESPnet and kaldi first. See Installation.

Next, run
```
pip install -r requirements.txt
```
to install the required packages.

Data Preparation

Download dataset.

Download the ABCS Corpus here: Links.

Download the noisy air conducted data (ns_air_data.zip) here: [Onedrive] or [Baidu Cloud]

Unzip the noisy data into ABCS's directory:
```
unzip -d <ABCS dir>/Audio/ ns_air_data.zip
```

Execute the data preparation script.

For inference only:

python3 data_prep --dataset_root <ABCS dir> --test

For full experiments:

python3 data_prep --dataset_root <ABCS dir>

Inference

Ensure that kaldi and ESPnet are properly installed on your environment. Next, have correctly adjust the third line in test.sh:
```
export ESPNETROOT=<Your Espnet Root>
```
Download the model parameters file here [Onedrive] or [Baidu Cloud]
```
mv model.acc.best <Your Path>/abc_asr/results
```
Run
```
bash test.sh
```

Results (CER %)

	SNR=-5dB	SNR=0dB	SNR=5dB	SNR=10dB	SNR=15dB	SNR=20dB	Clean
The proposed MMT	17.5	14.9	11.8	9.4	7.9	7.1	6.7

TODO

The training pipeline.

Citing

If you found this code helpful, please consider citing it as follows:

@ARTICLE{9961873,
  author={Wang, Mou and Chen, Junqi and Zhang, Xiao-Lei and Rahardja, Susanto},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus}, 
  year={2023},
  volume={31},
  number={},
  pages={513-524},
  keywords={Speech recognition;Speech processing;Signal to noise ratio;Spectrogram;Headphones;Microphones;Synchronization;Speech recognition;multi-modal speech processing;bone conduction;air- and bone-conducted speech corpus},
  doi={10.1109/TASLP.2022.3224305}}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
conf		conf
data		data
model		model
preprocessing		preprocessing
results		results
README.md		README.md
cmd.sh		cmd.sh
path.sh		path.sh
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal Speech Recognition for ABCS Corpus

Installation

Data Preparation

Inference

Results (CER %)

TODO

Citing

About

Releases

Packages

Languages

aaaceo890/abc_asr

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Speech Recognition for ABCS Corpus

Installation

Data Preparation

Inference

Results (CER %)

TODO

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages