See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2
Directory name | Corpus name | Task | Language | URL | Note |
---|---|---|---|---|---|
aishell | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/kysjcp | |
ami | The AMI Meeting Corpus | ASR | EN | http://groups.inf.ed.ac.uk/ami/corpus/ | |
an4 | CMU AN4 database | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ | |
babel | IARPA Babel corups | ASR | ~20 Languages | https://www.iarpa.gov/index.php/research-programs/babel | |
chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ | |
commonvoice | The Mozilla Common Voice | ASR | 13 Languages | https://voice.mozilla.org/datasets | |
csj | Corpus of Spontaneous Japanese | ASR | JP | https://pj.ninjal.ac.jp/corpus_center/csj/en/ | |
csmsc | Chinese Standard Mandarin Speech Copus | TTS | ZH | https://www.data-baker.com/open_source.html | |
dirha_wsj | Distant-speech Interaction for Robust Home Applications | Multi-Array ASR | EN | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj | |
how2 | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/Machine Translation/Speech Translation | EN->PT | https://github.com/srvk/how2-dataset | |
jsss | JSSS: Japanese speech corpus for summarization and simplification | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus | |
jsut | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS | JP | https://sites.google.com/site/shinnosuketakamichi/publication/jsut | |
jvs | JVS (Japanese versatile speech) corpus | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus | |
laborotv | LaboroTVSpeech (A large-scale Japanese speech corpus on TV recordings) | ASR | JP | https://laboro.ai/column/eg-laboro-tv-corpus-jp | |
librispeech | LibriSpeech ASR corpus | ASR | EN | http://www.openslr.org/12 | |
ljspeech | The LJ Speech Dataset | TTS | EN | https://keithito.com/LJ-Speech-Dataset/ | |
mini_an4 | Mini version of CMU AN4 database for the integration test | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ | |
reverb | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR | EN | https://reverb2014.dereverberation.com/ | |
timit | TIMIT Acoustic-Phonetic Continuous Speech Corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1 | |
vctk | English Multi-speaker Corpus for CSTR Voice Cloning Toolkit | TTS | EN | http://www.udialogue.org/download/cstr-vctk-corpus.html | |
vivos | VIVOS (Vietnamese corpus for ASR) | ASR | VI | https://ailab.hcmus.edu.vn/vivos/ | |
voxforge | VoxForge | ASR | 7 languages | http://www.voxforge.org/ | |
wsj | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A | |
wsj0_2mix | MERL WSJ0-mix multi-speaker dataset | ASR/SE | EN | http://www.merl.com/demos/deep-clustering | |
wsj0_2mix_spatialized | MERL WSJ0-mix multi-speaker dataset (Spatialized version) | ASR/Multichannel ASR/SE | EN | http://www.merl.com/demos/deep-clustering | |
yesno | The "yesno" corpus | ASR | HE | http://www.openslr.org/1 | |
zeroth_korean | Zeroth-Korean | ASR | KR | http://www.openslr.org/40 |