End-to-End Speech Translation Progress

Industry News

Tutorial

EACL 2021 tutorial: Speech Translation
Blog: Getting Started with End-to-End Speech Translation
ACL 2020 Theme paper: Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
INTERSPEECH 2019 survey talk: Spoken Language Translation

Data

Open Speech Language Resources: OSLR
Open Speech Corpora: Open Speech Corpora
Informatics Research Data Repository: IRDR
UA Speech Database: UASD

Corpus	Direction	Target	Duration	License
CoVoST 2	{Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En and En -> {De, Ca, Zh, Fa, Et, Mn, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy}	Text	2880h	CC0
CVSS	{Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En	Text & Speech	1900h	CC BY 4.0
mTEDx	{Es, Fr, Pt, It, Ru, El} -> En, {Fr, Pt, It} -> Es, Es -> {Fr, It}, {Es,Fr} -> Pt	Text	765h	CC BY-NC-ND 4.0
CoVoST	{Fr, De, Nl, Ru, Es, It, Tr, Fa, Sv, Mn, Zh} -> En	Text	700h	CC0
MUST-C & MUST-Cinema	En -> {De, Es, Fr, It, Nl, Pt, Ro, Ru, Ar, Cs, Fa, Tr, Vi, Zh}	Text	504h	CC BY-NC-ND 4.0
How2	En -> Pt	Text	300h	Youtube & CC BY-SA 4.0
Augmented LibriSpeech	En -> Fr	Text	236h	CC BY 4.0
Europarl-ST	{En, Fr, De, Es, It, Pt, Pl, Ro, Nl} -> {En, Fr, De, Es, It, Pt, Pl, Ro, Nl}	Text	280h	CC BY-NC 4.0
Kosp2e	Ko -> En	Text	198h	Mixed CC
Fisher + Callhome	Es -> En	Text	160h+20h	LDC
MaSS	parallel among En, Es, Eu, Fi, Fr, Hu, Ro and Ru	Text & Speech	172h	Bible.is
LibriVoxDeEn	De -> En	Text	110h	CC BY-NC-SA 4.0
Prabhupadavani	parallel among En, Fr, De, Gu, Hi, Hu, Id, It, Lv, Lt, Ne, Fa, Pl, Pt, Ru, Sl, Sk, Es, Se, Ta, Te, Tr, Bg, Hr, Da and Nl	Text	94h
BSTC	Zh -> En	Text	68h
LibriS2S	De <-> En	Text & Speech	52h/57h	CC BY-NC-SA 4.0

Toolkit

Paper

2022

[arXiv] Generating Synthetic Speech from SpokenVocab for Speech Translation
[arXiv] Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data
[arXiv] WACO: Word-Aligned Contrastive Learning for Speech Translation
[arXiv] AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation
[arXiv] Attention as a guide for Simultaneous Speech Translation
[arXiv] BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
[arXiv] Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features
[arXiv] M3ST: Mix at Three Levels for Speech Translation
[arXiv] Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
[arXiv] ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English
[arXiv] M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation
[arXiv] LibriS2S: A German-English Speech-to-Speech Translation Corpus
[arXiv] Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
[arXiv] Does Simultaneous Speech Translation need Simultaneous Models?
[arXiv] Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
[arXiv] Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
[arXiv] GigaST: A 10,000-hour Pseudo Speech Translation Corpus
[arXiv] Multilingual Simultaneous Speech Translation
[arXiv] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
[arXiv] Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
[arXiv] Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages
[arXiv] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
[arXiv] SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
[EMNLP Findings] RedApt: An Adaptor for WAV2VEC 2 Encoding Faster and Smaller Speech Translation without Quality Compromise
[INTERSPEECH] Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
[NAACL] Textless Speech-to-Speech Translation on Real Data
[ICML] Revisiting End-to-End Speech-to-Text Translation From Scratch
[ICML] Translatotron 2: Robust direct speech-to-speech translation
[ACL] Learning When to Translate for Streaming Speech
[ACL] Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation
[ACL] UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation
[ACL] Direct speech-to-speech translation with discrete units
[ACL] STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
[ACL Findings] End-to-End Speech Translation for Code Switched Speech
[ICASSP] Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques
[NN] Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing
[AAAI] Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

2021

2020

2019

Contact

Changhan Wang ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Speech Translation Progress

Industry News

Tutorial

Data

Toolkit

Paper

2022

2021

2020

2019

2018

2017

2016

2013

Contact

About

Releases

Packages

License

sukuya/SpeechTransProgress

Folders and files

Latest commit

History

Repository files navigation

End-to-End Speech Translation Progress

Industry News

Tutorial

Data

Toolkit

Paper

2022

2021

2020

2019

2018

2017

2016

2013

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages