You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kaldi has some existing systems for audio-based language ID (see the egs/lre* directories), but their training datasets are inaccessible. It is probably most straightforward to build one ourselves using the language labels in Mozilla Common Voice and the language labels implied by the datasets here: https://github.com/google/language-resources/
Data augmentation is probably a must. Our data is noisier than these source datasets are. Start with SpecAugment.
Ideally the model shouldn't be super big. The idea is to get a good sense of our language breakdown based on audio, not to have a super accurate model.
The text was updated successfully, but these errors were encountered:
Kaldi has some existing systems for audio-based language ID (see the egs/lre* directories), but their training datasets are inaccessible. It is probably most straightforward to build one ourselves using the language labels in Mozilla Common Voice and the language labels implied by the datasets here: https://github.com/google/language-resources/
Building on top of the speech classification workflow in nemo seems like a reasonable first step: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html
Data augmentation is probably a must. Our data is noisier than these source datasets are. Start with SpecAugment.
Ideally the model shouldn't be super big. The idea is to get a good sense of our language breakdown based on audio, not to have a super accurate model.
The text was updated successfully, but these errors were encountered: