GitHub - davidmokos/FluxMusic: Text-to-Music Generation with Rectified Flow Transformers

FluxMusic: Text-to-Music Generation with Rectified Flow Transformer
_{Official PyTorch Implementation}

This repo contains PyTorch model definitions, pre-trained weights, and training/sampling code for paper Flux that plays music. It explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation. The model architecture can be seen as follows:

To-do list

training / inference scripts
clean code
all ckpts
gradio demo, webpage for audio samples

Requirements

To install the requirements, run:

pip install -r requirements.txt

1. Training

You can refer to the link to build the running environment.

To launch small version in the latent space training with N GPUs on one node with pytorch DDP:

torchrun --nnodes=1 --nproc_per_node=N train.py \
--version small \
--data-path xxx \
--global_batch_size 128

More scripts of different model size can reference to scripts file direction.

2. Inference

We include a sample.py script which samples music clips according to conditions from a MusicFlux model as:

python sample.py \
--version small \
--ckpt_path /path/to/model \
--audioldm2_model_path /path/to/audioldm2_model \
--prompt_file config/example.txt

To get the ckpt_path, download one of the FluxMusic-[Small/Base/Large/Giant] ckpts from the table below

To get the audioldm2_model_path, clone the AudioLDM2 repository. Make sure you have git-lfs installed.

git lfs install
git clone https://huggingface.co/cvssp/audioldm2

All prompts used in paper are lists in config/example.txt.

3. Download Ckpts and Data

We use VAE and Vocoder in AudioLDM2, CLAP-L, and T5-XXL. You can download in the following table directly, we also provide the training scripts in our experiments.

Note that in actual experiments, a restart experiment was performed due to machine malfunction, so there will be resume options in some scripts.

Model	Training steps	Url	Training scripts
VAE	-	link	-
Vocoder	-	link	-
T5-XXL	-	link	-
CLAP-L	-	link	-
FluxMusic-Small	200K	link	link
FluxMusic-Base	200K	link	link
FluxMusic-Large	200K	link	link
FluxMusic-Giant	200K	link	link

Note that 200K-steps ckpts are trained on a sub-training set and used for ploted the scaling experiments as well as case studies in the paper. The full version of main results will be released right way.

The construction of training data can refer to the test.py file, showing a simple build of combing differnet datasets in json file.

Considering copyright issues, the data used in the paper needs to be downloaded by oneself. A quick download link can be found in Huggingface : ).

This is a research project, and it is recommended to try advanced products:

Acknowledgments

The codebase is based on the awesome Flux and AudioLDM2 repos.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
audioldm2		audioldm2
config		config
modules		modules
scripts		scripts
visuals		visuals
wav		wav
LICENSE.txt		LICENSE.txt
README.md		README.md
constants.py		constants.py
model.py		model.py
requirements.txt		requirements.txt
sample.py		sample.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FluxMusic: Text-to-Music Generation with Rectified Flow Transformer
_{Official PyTorch Implementation}

To-do list

Requirements

1. Training

2. Inference

3. Download Ckpts and Data

Acknowledgments

About

Releases

Packages

Languages

License

davidmokos/FluxMusic

Folders and files

Latest commit

History

Repository files navigation

FluxMusic: Text-to-Music Generation with Rectified Flow Transformer Official PyTorch Implementation

To-do list

Requirements

1. Training

2. Inference

3. Download Ckpts and Data

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

FluxMusic: Text-to-Music Generation with Rectified Flow Transformer
_{Official PyTorch Implementation}

Packages