Create audio-based language-id system #34

galv · 2021-06-23T19:14:35Z

Kaldi has some existing systems for audio-based language ID (see the egs/lre* directories), but their training datasets are inaccessible. It is probably most straightforward to build one ourselves using the language labels in Mozilla Common Voice and the language labels implied by the datasets here: https://github.com/google/language-resources/

Building on top of the speech classification workflow in nemo seems like a reasonable first step: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html

Data augmentation is probably a must. Our data is noisier than these source datasets are. Start with SpecAugment.

Ideally the model shouldn't be super big. The idea is to get a good sense of our language breakdown based on audio, not to have a super accurate model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create audio-based language-id system #34

Create audio-based language-id system #34

galv commented Jun 23, 2021

Create audio-based language-id system #34

Create audio-based language-id system #34

Comments

galv commented Jun 23, 2021