Stars
A simple, performant re-implementation of AutoVC
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques …
bfs18 / tacotron2
Forked from NVIDIA/tacotron2Tacotron 2 - PyTorch implementation with faster-than-realtime inference
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
CMU multilingual speech repository
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
taneliang / gst-tacotron2
Forked from NVIDIA/mellotronMellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Graph4nlp is the library for the easy use of Graph Neural Networks for NLP. Welcome to visit our DLG4NLP website (https://dlg4nlp.github.io/index.html) for various learning resources!
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
A set of tools to use in Microsoft Azure Form Recognizer and OCR services.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
LaNMT: Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference
Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV).…
Speech Recognition using DeepSpeech2.