GitHub - zhenye234/LLaSA_training: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Update (2025-02-07): Our paper has been released! Llasa 1b Multilingual version released!

Training

torchrun --nproc_per_node=8 train_tts.py config.json

or

sbatch run_slurm.sh

Data

You can download tokenized open-source speech data here. This includes LibriHeavy, Emilia (in both Chinese and English), and WenetSpeech4TTS, totaling approximately 160,000 hours of open-source data.

Our models are trained on 250,000 hours of speech data. Of this, 160,000 hours come from the open-source datasets mentioned above, while the remaining 90,000 hours are from internal datasets, which are not yet available for open-source release.

Data instruction

Text_sequence is encoded by the text tokenizer from Llama, for example, Llama-3.2-1B-Instruct

Speech_sequence is extrated through X-codec2 We change the value of speech tokens by adding len(text tokenizer) +8 special tokens thereby forming a unified tokenizer that encompasses both speech and text.

Finetune instruction

Coming Soon

Directly used on Hugging Face

Codec: xcodec2 (Please install new version xcodec2==0.1.3)

Llasa 1b version: Llasa-1B

Llasa 1b Multilingual version: Llasa-1B-Multilingual (Not mentioned in the paper)

Llasa 3b version: Llasa-3B

Llasa 8b version: Llasa-8B

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE		LICENSE
README.md		README.md
config.json		config.json
ds_config_zero2.json		ds_config_zero2.json
ds_config_zero3.json		ds_config_zero3.json
requirements.txt		requirements.txt
run_slurm.sh		run_slurm.sh
train_tts.py		train_tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training

Data

Data instruction

Finetune instruction

Directly used on Hugging Face

About

Releases

Packages

Languages

License

zhenye234/LLaSA_training

Folders and files

Latest commit

History

Repository files navigation

Training

Data

Data instruction

Finetune instruction

Directly used on Hugging Face

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages