This is an implementation of Melo TTS and OpenVoice Tone Clone by onnxruntime. We restruct the text utils for speeding up. We convert the models into onnx format. The models are stored in this repository as lfs.
We has just implement the zh-mix-en language. Other languages will come soon.
# tts demo
from melo_onnx import MeloTTX_ONNX
import soundfile
model_path = "path/to/folder/of/model_tts"
tts = MeloTTX_ONNX(model_path)
audio = tts.speak("今天天气真nice。", tts.speakers[0])
soundfile.write("path/of/result.wav", audio, samplerate=tts.sample_rate)
# Tone clone demo
from melo_onnx import OpenVoiceToneClone_ONNX
tc = OpenVoiceToneClone_ONNX("path/to/folder/of/model_tone_clone")
import soundfile
tgt = soundfile.read("path/of/audio_for_tone_color", dtype='float32')
tgt = tc.resample(tgt[0], tgt[1])
tgt_tone_color = tc.extract_tone_color(tgt)
src = soundfile.read("path/of/audio_to_change_tone", dtype='float32')
src = tc.resample(src[0], src[1])
result = tc.tone_clone(src, tgt_tone_color)
soundfile.write("path/of/result.wav", result, tc.sample_rate)
Models can be downloaded from the modelscope or huggingface:
model | repositories | comment |
---|---|---|
tts | modelscope or huggingface | zh-mix-en |
tone clone | modelscope or huggingface |
- model_path str. The path of the folder store the model.
- execution_provider str. The device for the onnxruntime, CUDA, CPU, or others. If it's None, the library will choose the better one between the CUDA and CPU.
- verbose bool. Set True for display the detail information when the library working.
- onnx_session_options onnxruntim.SessionOptions. You can setup the special options for the onnx session.
- onnx_params dict. The other parameters you want to pass into the onnxruntim.InferenceSession
- text str. The text you want to synthesis.
- speaker str. The speaker you want to use.
- speed float. The speed of the speech
- sdp_ratio float.
- noise_scale float.
- noise_scale_w float.
- pbar function. Such as tqdm
- Returns numpy.array. The data of the result audio
- speakers: [str]. Readonly. The available speakers
- sample_rate: int. Readonly. The sample rate of the synthesis result
- language: str. Readonly. The language of the current model
- model_path str. The path of the folder store the model.
- execution_provider str. The device for the onnxruntime, CUDA, CPU, or others. If it's None, the library will choose the better one between the CUDA and CPU.
- verbose bool. Set True for display the detail information when the library working.
- onnx_session_options onnxruntim.SessionOptions. You can setup the special options for the onnx session.
- onnx_params dict. The other parameters you want to pass into the onnxruntim.InferenceSession
- audio numpy.array. The data of the audio
- Returns numpy.array. The tone color vector
- color list[numpy.array]. The list of the tone colors you want to mix. Each element should be the result of extract_tone_color.
- Returns numpy.array. The tone color vector
- audio numpy.array. The data of the audio that will be changed the tone
- target_tone_color numpy.array. The tone color that you want to clone. It should be the result of the extract_tone_color or mix_tone_color.
- tau float
- Returns numpy.array. The dest audio
- audio numpy.array. The source audio you want to resample.
- original_rate int. The original sample rate of the source audio
- Returns numpy.array. The dest data of the audio after resample