forked from hao-ai-lab/FastVideo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: rlsu9 <[email protected]>
- Loading branch information
Showing
21 changed files
with
279 additions
and
438 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
|
||
|
||
|
||
## 🧱 Data Preprocess | ||
|
||
To save GPU memory, we precompute text embeddings and VAE latents to eliminate the need to load the text encoder and VAE during training. | ||
|
||
|
||
We provide a sample dataset to help you get started. Download the source media using the following command: | ||
```bash | ||
python scripts/huggingface/download_hf.py --repo_id=FastVideo/Image-Vid-Finetune-Src --local_dir=data/Image-Vid-Finetune-Src --repo_type=dataset | ||
``` | ||
To preprocess the dataset for fine-tuning or distillation, run: | ||
``` | ||
bash scripts/preprocess/preprocess_mochi_data.sh # for mochi | ||
bash scripts/preprocess/preprocess_hunyuan_data.sh # for hunyuan | ||
``` | ||
|
||
The preprocessed dataset will be stored in `Image-Vid-Finetune-Mochi` or `Image-Vid-Finetune-HunYuan` correspondingly. | ||
|
||
### Process your own dataset | ||
|
||
If you wish to create your own dataset for finetuning or distillation, please structure you video dataset in the following format: | ||
|
||
path_to_dataset_folder/ | ||
├── media/ | ||
│ ├── 0.jpg | ||
│ ├── 1.mp4 | ||
│ ├── 2.jpg | ||
├── video2caption.json | ||
└── merge.txt | ||
|
||
Format the JSON file as a list, where each item represents a media source: | ||
|
||
For image media, | ||
``` | ||
{ | ||
"path": "0.jpg", | ||
"cap": ["captions"] | ||
} | ||
``` | ||
For video media, | ||
``` | ||
{ | ||
"path": "1.mp4", | ||
"resolution": { | ||
"width": 848, | ||
"height": 480 | ||
}, | ||
"fps": 30.0, | ||
"duration": 6.033333333333333, | ||
"cap": [ | ||
"caption" | ||
] | ||
} | ||
``` | ||
|
||
Use a txt file (merge.txt) to contain the source folder for media and the JSON file for meta information: | ||
|
||
``` | ||
path_to_media_source_foder,path_to_json_file | ||
``` | ||
|
||
Adjust the `DATA_MERGE_PATH` and `OUTPUT_DIR` in `scripts/preprocess/preprocess_****_data.sh` accordingly and run: | ||
``` | ||
bash scripts/preprocess/preprocess_****_data.sh | ||
``` | ||
The preprocessed data will be put into the `OUTPUT_DIR` and the `videos2caption.json` can be used in finetune and distill scripts. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
## 🎯 Distill | ||
|
||
|
||
Our distillation recipe is based on [Phased Consistency Model](https://github.com/G-U-N/Phased-Consistency-Model). We did not find significant improvement using multi-phase distillation, so we keep the one phase setup similar to the original latent consistency model's recipe. | ||
|
||
We use the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main/all_mixkit) dataset for distillation. To avoid running the text encoder and VAE during training, we preprocess all data to generate text embeddings and VAE latents. | ||
|
||
Preprocessing instructions can be found [data_preprocess.md](#-data-preprocess). For convenience, we also provide preprocessed data that can be downloaded directly using the following command: | ||
|
||
```bash | ||
python scripts/huggingface/download_hf.py --repo_id=FastVideo/HD-Mixkit-Finetune-Hunyuan --local_dir=data/HD-Mixkit-Finetune-Hunyuan --repo_type=dataset | ||
``` | ||
Next, download the original model weights with: | ||
|
||
```bash | ||
python scripts/huggingface/download_hf.py --repo_id=FastVideo/hunyuan --local_dir=data/hunyuan --repo_type=model | ||
``` | ||
To launch the distillation process, use the following commands: | ||
|
||
``` | ||
bash scripts/distill/distill_mochi.sh # for mochi | ||
bash scripts/distill/distill_hunyuan.sh # for hunyuan | ||
``` | ||
We also provide an optional script for distillation with adversarial loss, located at `fastvideo/distill_adv.py`. Although we tried adversarial loss, we did not observe significant improvements. |
Oops, something went wrong.