Name		Name	Last commit message	Last commit date
Latest commit History 2,239 Commits
.circleci		.circleci
.github		.github
docs		docs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.model		LICENSE.model
MANIFEST.in		MANIFEST.in
README.fairseq		README.fairseq
README.md		README.md
RELEASE.md		RELEASE.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
release_utils.py		release_utils.py
setup.cfg		setup.cfg
setup.py		setup.py
train.py		train.py
ust.png		ust.png

Repository files navigation

Universal Speech Translator

Universal Speech Translator (UST) is the project to enable speech-to-speech translation with real-time, natural-sounding translations at near-human level quality. This code repository open sources our latest works for the mission, including code, model architecture design, data mining, and benchmark efforts.

October 2022 Release:

Speech-to-speech translation for a real-world unwritten language (English-Hokkien) [Project Page]

We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. We open source our best S2ST models and the S2ST benchmark dataset to facilitate future research in this field.

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [Project Page]

We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units subsequently. We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization. To leverage large amounts of unlabeled text data, we pre-train the first-pass text decoder based on the self-supervised denoising auto-encoding task. Experimental evaluations on benchmark datasets at various data scales demonstrate that UnitY outperforms a single-pass speech-to-unit translation model by up to 3.5 ASR-BLEU with 2.83x decoding speed-up.

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations [Project Page]

We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test sets. Enabled by the multilinguality of SpeechMatrix, we also explore multilingual speech-to-speech translation, demonstrating that model pre-training and sparse scaling using Mixture-of-Experts bring large gains to translation performance. The mined data and models are freely available.

w2v-BERT implementation

We implement and release w2v-BERT, a SoTA self-supervised speech representation learning algorithm combining contrastive learning and masked language modeling for pre-training speech encoders. We also provide checkpoints pre-trained on the Libri-light corpus as well as those fine-tuned on the LibriSpeech 100h/960h subsets to facilitate future research.

ASR-BLEU evaluation toolkit

We enable a reproducible way to compute the ASR-BLEU metric. We provide a set of publicly available ASR models which covers all target languages we use. A helper script transcribes model’s audio predictions and computes a BLEU score between model’s predicted transcriptions and reference translations.

Simple and Effective Unsupervised Speech Translation [Project Page coming soon]

The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages. To address this issue, we study a simple and effective approach to build speech translation systems without any labeled data by leveraging recent advances in unsupervised speech recognition, machine translation and speech synthesis. Furthermore, we present an unsupervised domain adaptation technique for pre-trained speech models which improves the performance of downstream unsupervised speech recognition, especially for low-resource settings. Experiments show that unsupervised speech-to-text translation outperforms the previous unsupervised state of the art by 3.2 BLEU on the Libri-Trans benchmark, on CoVoST 2, our best systems outperform the best supervised end-to-end models (without pre-training) from only two years ago by an average of 5.0 BLEU over five X-En directions. We also report competitive results on MuST-C and CVSS benchmarks.

License

UST code and fairseq(-py) is MIT-licensed available in LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Speech Translator

October 2022 Release:

Speech-to-speech translation for a real-world unwritten language (English-Hokkien) [Project Page]

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [Project Page]

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations [Project Page]

w2v-BERT implementation

ASR-BLEU evaluation toolkit

Simple and Effective Unsupervised Speech Translation [Project Page coming soon]

License

About

Releases

Packages

Languages

License

MigoXV/fairseq

Folders and files

Latest commit

History

Repository files navigation

Universal Speech Translator

October 2022 Release:

Speech-to-speech translation for a real-world unwritten language (English-Hokkien) [Project Page]

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [Project Page]

SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations [Project Page]

w2v-BERT implementation

ASR-BLEU evaluation toolkit

Simple and Effective Unsupervised Speech Translation [Project Page coming soon]

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages