typos

lngao · Apr 9, 2018 · 54593a0 · 54593a0
1 parent e48447f
commit 54593a0
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -42,7 +42,7 @@ Tensorflow implementation of Deep mind's Tacotron-2. A deep neural network archi
 The previous tree shows what the current state of the repository.
 
 - Step **(0)**: Get your dataset, here I have set the examples of **Ljspeech**, **en_US** and **en_UK** (from **M-AILABS**).
-- Step **(1)**: Preprocess your data. This will give you the **trainin_data** folder.
+- Step **(1)**: Preprocess your data. This will give you the **training_data** folder.
 - Step **(2)**: Train your Tacotron model. Yields the **logs-Tacotron** folder.
 - Step **(3)**: Synthesize/Evaluate the Tacotron model. Gives the **tacotron_output** folder.
 
@@ -82,7 +82,7 @@ else:
 # Dataset:
 We tested the code above on the [ljspeech dataset](https://keithito.com/LJ-Speech-Dataset/), which has almost 24 hours of labeled single actress voice recording. (further info on the dataset are available in the README file when you download it)
 
-We are also running current tests on the [new L-AILABS speech dataset](http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/) which contains more than 700h of speech (more than 80 Gb of data) for more than 10 languages.
+We are also running current tests on the [new M-AILABS speech dataset](http://www.m-ailabs.bayern/en/the-mailabs-speech-dataset/) which contains more than 700h of speech (more than 80 Gb of data) for more than 10 languages.
 
 After **downloading** the dataset, **extract** the compressed file, and **place the folder inside the cloned repository.**
 

diff --git a/datasets/preprocessor.py b/datasets/preprocessor.py
@@ -106,14 +106,13 @@ def _process_utterance(mel_dir, wav_dir, index, wav_path, text):
 
 	#Zero pad for quantized signal
 	out = np.pad(out, (l, r), mode='constant', constant_values=constant_values)
-	T = mel_spectrogram.shape[0]
 	time_steps = len(out)
-	assert time_steps >= T * audio.get_hop_size()
+	assert time_steps >= n_frames * audio.get_hop_size()
 
 	#time resolution adjustement
 	#ensure length of raw audio is multiple of hop size so that we can use
 	#transposed convolution to upsample
-	out = out[:T * audio.get_hop_size()]
+	out = out[:n_frames * audio.get_hop_size()]
 	assert time_steps % audio.get_hop_size() == 0
 
 	# Write the spectrogram and audio to disk

diff --git a/preprocess.py b/preprocess.py
@@ -40,8 +40,8 @@ def norm_data(args):
 
 
 	if args.dataset == 'M-AILABS':
-		supported_languages = ['en_US', 'en_UK', 'fr_FR', 'it_IT', 'de_DE', 'es-ES', 'ru-RU', 
-			'uk-UK', 'pl-PL', 'nl-NL', 'pt-PT', 'sv-FI', 'sv-SE', 'tr-TR', 'ar-SA']
+		supported_languages = ['en_US', 'en_UK', 'fr_FR', 'it_IT', 'de_DE', 'es_ES', 'ru_RU', 
+			'uk_UK', 'pl_PL', 'nl_NL', 'pt_PT', 'sv_FI', 'sv_SE', 'tr_TR', 'ar_SA']
 		if args.language not in supported_languages:
 			raise ValueError('Please enter a supported language to use from M-AILABS dataset! \n{}'.format(
 				supported_languages))