Skip to content

Commit

Permalink
v2.2
Browse files Browse the repository at this point in the history
  • Loading branch information
neonbjb committed May 6, 2022
1 parent b327be5 commit ffd0238
Show file tree
Hide file tree
Showing 32 changed files with 77 additions and 34 deletions.
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ This repo contains all the code needed to run Tortoise TTS in inference mode.

### New features

#### v2.2; 2022/5/5
- Added several new voices from the training set.
- Automated redaction. Wrap the text you want to use to prompt the model but not be spoken in brackets.
- Bug fixes

#### v2.1; 2022/5/2
- Added ability to produce totally random voices.
- Added ability to download voice conditioning latent via a script, and then use a user-provided conditioning latent.
Expand Down Expand Up @@ -95,11 +100,9 @@ For the those in the ML space: this is created by projecting a random vector ont

### Provided voices

This repo comes with several pre-packaged voices. You will be familiar with many of them. :)

Most of the provided voices were not found in the training set. Experimentally, it seems that voices from the training set
produce more realistic outputs then those outside of the training set. Any voice prepended with "train" came from the
training set.
This repo comes with several pre-packaged voices. Voices prepended with "train_" came from the training set and perform
far better than the others. If your goal is high quality speech, I recommend you pick one of them. If you want to see
what Tortoise can do for zero-shot mimicing, take a look at the others.

### Adding a new voice

Expand Down
Binary file added examples/prompting/angry.mp3
Binary file not shown.
Binary file added examples/prompting/happy.mp3
Binary file not shown.
Binary file added examples/prompting/sad.mp3
Binary file not shown.
Binary file added examples/prompting/scared.mp3
Binary file not shown.
4 changes: 0 additions & 4 deletions examples/various/desktop.ini

This file was deleted.

2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
setuptools.setup(
name="TorToiSe",
packages=setuptools.find_packages(),
version="2.1.3",
version="2.2.0",
author="James Betker",
author_email="[email protected]",
description="A high quality multi-voice text-to-speech library",
Expand Down
2 changes: 0 additions & 2 deletions tortoise/models/vocoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,8 +284,6 @@ def eval(self, inference=False):
self.remove_weight_norm()

def remove_weight_norm(self):
print('Removing weight norm...')

nn.utils.remove_weight_norm(self.conv_pre)

for layer in self.conv_post:
Expand Down
2 changes: 1 addition & 1 deletion tortoise/utils/audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def __init__(self, filter_length=1024, hop_length=256, win_length=1024,
self.stft_fn = STFT(filter_length, hop_length, win_length)
from librosa.filters import mel as librosa_mel_fn
mel_basis = librosa_mel_fn(
sampling_rate, filter_length, n_mel_channels, mel_fmin, mel_fmax)
sr=sampling_rate, n_fft=filter_length, n_mels=n_mel_channels, fmin=mel_fmin, fmax=mel_fmax)
mel_basis = torch.from_numpy(mel_basis).float()
self.register_buffer('mel_basis', mel_basis)

Expand Down
7 changes: 5 additions & 2 deletions tortoise/utils/wav2vec_alignment.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def align(self, audio, expected_text, audio_sample_rate=24000):
logits = logits[0]
pred_string = self.tokenizer.decode(logits.argmax(-1).tolist())

fixed_expectation = max_alignment(expected_text, pred_string)
fixed_expectation = max_alignment(expected_text.lower(), pred_string)
w2v_compression = orig_len // logits.shape[0]
expected_tokens = self.tokenizer.encode(fixed_expectation)
expected_chars = list(fixed_expectation)
Expand Down Expand Up @@ -100,7 +100,10 @@ def pop_till_you_win():
break

pop_till_you_win()
assert len(expected_tokens) == 0, "This shouldn't happen. My coding sucks."
if not (len(expected_tokens) == 0 and len(alignments) == len(expected_text)):
torch.save([audio, expected_text], 'alignment_debug.pth')
assert False, "Something went wrong with the alignment algorithm. I've dumped a file, 'alignment_debug.pth' to" \
"your current working directory. Please report this along with the file so it can get fixed."

# Now fix up alignments. Anything with -1 should be interpolated.
alignments.append(orig_len) # This'll get removed but makes the algorithm below more readable.
Expand Down
Binary file added tortoise/voices/applejack/1.wav
Binary file not shown.
Binary file added tortoise/voices/applejack/2.wav
Binary file not shown.
Binary file added tortoise/voices/applejack/3.wav
Binary file not shown.
Binary file added tortoise/voices/rainbow/1.wav
Binary file not shown.
Binary file added tortoise/voices/rainbow/2.wav
Binary file not shown.
Binary file added tortoise/voices/rainbow/3.wav
Binary file not shown.
Binary file added tortoise/voices/train_daws/1.mp3
Binary file not shown.
Binary file added tortoise/voices/train_daws/2.mp3
Binary file not shown.
Binary file added tortoise/voices/train_daws/3.mp3
Binary file not shown.
Binary file added tortoise/voices/train_dreams/1.mp3
Binary file not shown.
Binary file added tortoise/voices/train_dreams/2.mp3
Binary file not shown.
Binary file added tortoise/voices/train_dreams/3.mp3
Binary file not shown.
Binary file added tortoise/voices/train_empire/1.mp3
Binary file not shown.
Binary file added tortoise/voices/train_empire/2.mp3
Binary file not shown.
Binary file added tortoise/voices/train_empire/3.mp3
Binary file not shown.
Binary file added tortoise/voices/train_mouse/1.mp3
Binary file not shown.
Binary file added tortoise/voices/train_mouse/2.mp3
Binary file not shown.
Binary file added tortoise/voices/train_mouse/3.mp3
Binary file not shown.
Binary file added tortoise/voices/yannic/00045.mp3
Binary file not shown.
Binary file added tortoise/voices/yannic/00055.mp3
Binary file not shown.
Binary file added tortoise/voices/yannic/00203.mp3
Binary file not shown.
81 changes: 62 additions & 19 deletions tortoise_v2_examples.html

Large diffs are not rendered by default.

0 comments on commit ffd0238

Please sign in to comment.