Speech to Text Recognition

This repository uses the huggingface and the transformers library

The huggingface library is the most popular NLP library with over 68,000 stars on github
It provides state-of-the-art natural processing models and very clean API.
Transformers library supports with a deep learning library like PyTorch, TensorFlow
It supports Python 3.6+, PyTorch 1.1.0+, TensorFlow 2.0+, and Flax
It can be used to perform Speech to Text Conversion in German Language

Selection of ASR Model

Our research shows a selection of ASR model architectures that are pretrained on the German language and evaluates state-of-the-art open-source models in German language on diverse datasets
With comparison to English, fewer benchmark results have been published in German.

Considerations for selection

Speaker Independent
Continuous and Spontaneous Speech
Large Vocabulary
Open Sourced and pretrained

Selected Models

wav2vec2.0
Conformer Transducer
Conformer CTC
Quartznet
ContextNet
Citrinet
Speechbrain

Model wav2vec2.0

This model is by facebook AI research (FAIR)
There are several pretrained model are available on huggingface
Different finetuned versions of this model on lastest common voice datasets (V-6.0, V-7.0, V-8.0, V-9.0 ) are also available
We have selected one pretrained model wav2vec2 large xlsr-53-german by facebook
And a fine-tuned version on Commonvoice (6.0) facebook wav2vec2 large xlsr-53-german by Jonatas Grosman
Since it had the lowest self-reported WER on Common Voice (12.06%) compared to 18.5% reported for the original model provided by FAIR

Why XLSR-53

The model facebook wav2vec2 large xlsr-53-german is a Natural Language Processing (NLP) Model implemented in Transformer library, generally using the Python programming language.

It is multilingual pre-trained wav2vec 2.0 (XLSR)
Architecture is based on Transformers’ encoder
Datasets on which a pre-trained model was trained (Multilingual LibriSpeech, Common Voice, Babel)

Model usage

We can find this model easily in transformers python library.
To download and use any of the pretrained models for our given task
We just need to use few lines of codes (PyTorch version)

Usage in Python


# Import model and tokenizer
from transformers import AutoModel, AutoTokenizer 

# Define the model repo
model_name = "facebook/wav2vec2-large-xlsr-53-german" 


# Download pytorch model
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Transform input tokens 
inputs = tokenizer("Hello world!", return_tensors="pt")

# Model apply
outputs = model(**inputs)

Link to original and finetuned model

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.ipynb_checkpoints		.ipynb_checkpoints
compare_results		compare_results
samples		samples
scripts		scripts
slides		slides
stt_eng		stt_eng
test		test
whisper		whisper
ASR_with_NeMo.ipynb		ASR_with_NeMo.ipynb
README.md		README.md
google_test.py		google_test.py
nemo_stt.ipynb		nemo_stt.ipynb
nemo_stt_nlp_tts.ipynb		nemo_stt_nlp_tts.ipynb
nemo_stt_split.ipynb		nemo_stt_split.ipynb
test_results.ipynb		test_results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech to Text Recognition

This repository uses the huggingface and the transformers library

Selection of ASR Model

Considerations for selection

Selected Models

Model wav2vec2.0

Why XLSR-53

Model usage

Usage in Python

About

Releases

Packages

Languages

Iqbalca/speech_to_text

Folders and files

Latest commit

History

Repository files navigation

Speech to Text Recognition

This repository uses the huggingface and the transformers library

Selection of ASR Model

Considerations for selection

Selected Models

Model wav2vec2.0

Why XLSR-53

Model usage

Usage in Python

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages