Skip to content
/ MixLoRA Public

Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)

License

Notifications You must be signed in to change notification settings

VT-NLP/MixLoRA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirements and Installation

To set up your environment, run the following commands:

conda create -n mixlora python=3.8 -y
conda activate mixlora
sh setup.sh

Data Preparation

Training Dataset

Please download the dataset from Vision-Flan.

Evaluation Datset

The evaluation dataset we used can be downloaded from here.

Training & Inference

Specify the image_folder and data_path in the fine-tuning scripts according to the data preparation.

Training

To fine-tune the mode, run the following command:

sh scripts/finetune_mixlora.sh <routing-type> <num-experts> <num-rank>
  • <routing-type>: Specify the type of routing (input for instance-based IFS routing alone, input_lora_a_param for combined instance-based IFS routing and CFS routing).
  • <num-experts>: Specify the number of factors.
  • <num-rank>: Specify the number of rank.

The projector weights mm_projector.bin can be downloaded from the original LLava repo.

The trained model checkpoints can be found from here.

Inference

To run inference on all the multimodal tasks:

sh scripts/run_eval.sh <model-path> <data-dir>
  • <model-path>: Specify the path to the model
  • <data-dir>: Specify the path to the evaluation dataset

To run inference on MME:

sh scripts/run_eval_mme.sh <model-path> <data-dir>
  • <model-path>: Specify the path to the model
  • <data-dir>: Specify the path to the MME dataset

Acknowledgement

The codebase is built upon LLaVA. We would like to thank the authors for publicly releasing their code.

Citation

@article{shen2024multimodal,
  title={Multimodal Instruction Tuning with Conditional Mixture of LoRA},
  author={Shen, Ying and Xu, Zhiyang and Wang, Qifan and Cheng, Yu and Yin, Wenpeng and Huang, Lifu},
  journal={arXiv preprint arXiv:2402.15896},
  year={2024}
}

About

Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published