forked from hao-ai-lab/FastVideo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: rlsu9 <[email protected]>
- Loading branch information
Showing
9 changed files
with
93 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,41 @@ | ||
|
||
## ⚡ Finetune | ||
## ⚡ Full Finetune | ||
|
||
We support full fine-tuning for both the Mochi and Hunyuan models. Additionally, we provide Image-Video Mix finetuning. | ||
|
||
|
||
Ensure your data is prepared and preprocessed in the format specified in the [Data Preprocess](#-data-preprocess). | ||
Ensure your data is prepared and preprocessed in the format specified in [data_preprocess.md](docs/data_preprocess.md). For convenience, we also provide a mochi preprocessed Black Myth Wukong data that can be downloaded directly: | ||
```bash | ||
python scripts/huggingface/download_hf.py --repo_id=FastVideo/Mochi-Black-Myth --local_dir=data/Mochi-Black-Myth --repo_type=dataset | ||
``` | ||
Download the original model weights with: | ||
```bash | ||
python scripts/huggingface/download_hf.py --repo_id=genmo/mochi-1-preview --local_dir=data/mochi --repo_type=model | ||
python scripts/huggingface/download_hf.py --repo_id=FastVideo/hunyuan --local_dir=data/hunyuan --repo_type=model | ||
``` | ||
|
||
|
||
FastVideo/BLACK-MYTH-YQ | ||
Then run the finetune with: | ||
Then you can run the finetune with: | ||
``` | ||
bash scripts/finetune/finetune_mochi.sh # for mochi | ||
bash scripts/finetune/finetune_hunyuan.sh # for hunyuan | ||
``` | ||
For Image-Video Mixture Fine-tuning, make sure to enable the --group_frame option in your script. | ||
|
||
**Note that we did not tune the hyperparameters in the provided script** | ||
|
||
## Lora Finetune | ||
## ⚡ Lora Finetune | ||
|
||
Currently, we only provide Lora Finetune for Mochi model, the command for Lora Finetune is | ||
``` | ||
bash scripts/finetune/finetune_mochi_lora.sh | ||
``` | ||
### Minimum Hardware Requirement | ||
- 40 GB GPU memory each for 2 GPUs with lora | ||
- 30 GB GPU memory each for 2 GPUs with CPU offload and lora. | ||
|
||
## Finetune with Both Image and Video | ||
Our codebase support finetuning with both image and video. | ||
|
||
```bash | ||
bash scripts/finetune/finetune_hunyuan.sh | ||
bash scripts/finetune/finetune_mochi_lora_mix.sh | ||
``` | ||
For Image-Video Mixture Fine-tuning, make sure to enable the --group_frame option in your script. | ||
|
||
|
||
### 💰Hardware requirement | ||
|
||
- 72G VRAM is required for finetuning 10B mochi model. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
export WANDB_BASE_URL="https://api.wandb.ai" | ||
export WANDB_MODE=online | ||
|
||
CUDA_VISIBLE_DEVICES=5 torchrun --nnodes 1 --nproc_per_node 1 \ | ||
fastvideo/train.py \ | ||
--seed 42 \ | ||
--pretrained_model_name_or_path data/mochi \ | ||
--cache_dir data/.cache \ | ||
--data_json_path data/Image-Vid-Finetune-Mochi/videos2caption.json \ | ||
--validation_prompt_dir data/Image-Vid-Finetune-Mochi/validation \ | ||
--gradient_checkpointing \ | ||
--train_batch_size=1 \ | ||
--num_latent_t 14 \ | ||
--sp_size 1 \ | ||
--train_sp_batch_size 1 \ | ||
--dataloader_num_workers 1 \ | ||
--gradient_accumulation_steps=1 \ | ||
--max_train_steps=2000 \ | ||
--learning_rate=5e-6 \ | ||
--mixed_precision=bf16 \ | ||
--checkpointing_steps=200 \ | ||
--validation_steps 100 \ | ||
--validation_sampling_steps 64 \ | ||
--checkpoints_total_limit 3 \ | ||
--allow_tf32 \ | ||
--ema_start_step 0 \ | ||
--cfg 0.0 \ | ||
--ema_decay 0.999 \ | ||
--log_validation \ | ||
--output_dir=data/outputs/HSH-Taylor-Finetune-Lora \ | ||
--tracker_project_name HSH-Taylor-Finetune-Lora \ | ||
--num_frames 91 \ | ||
--group_frame \ | ||
--lora_rank 128 \ | ||
--lora_alpha 256 \ | ||
--master_weight_type "bf16" \ | ||
--use_lora |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters