Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create audio-based language-id system #34

Open
galv opened this issue Jun 23, 2021 · 0 comments
Open

Create audio-based language-id system #34

galv opened this issue Jun 23, 2021 · 0 comments

Comments

@galv
Copy link
Collaborator

galv commented Jun 23, 2021

Kaldi has some existing systems for audio-based language ID (see the egs/lre* directories), but their training datasets are inaccessible. It is probably most straightforward to build one ourselves using the language labels in Mozilla Common Voice and the language labels implied by the datasets here: https://github.com/google/language-resources/

Building on top of the speech classification workflow in nemo seems like a reasonable first step: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html

Data augmentation is probably a must. Our data is noisier than these source datasets are. Start with SpecAugment.

Ideally the model shouldn't be super big. The idea is to get a good sense of our language breakdown based on audio, not to have a super accurate model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant