To set up your environment, run the following commands:
conda create -n mixlora python=3.8 -y
conda activate mixlora
sh setup.sh
Please download the dataset from Vision-Flan.
The evaluation dataset we used can be downloaded from here.
Specify the image_folder
and data_path
in the fine-tuning scripts according to the data preparation.
To fine-tune the mode, run the following command:
sh scripts/finetune_mixlora.sh <routing-type> <num-experts> <num-rank>
<routing-type>
: Specify the type of routing (input
for instance-based IFS routing alone,input_lora_a_param
for combined instance-based IFS routing and CFS routing).<num-experts>
: Specify the number of factors.<num-rank>
: Specify the number of rank.
The projector weights
mm_projector.bin
can be downloaded from the original LLava repo.
The trained model checkpoints can be found from here.
To run inference on all the multimodal tasks:
sh scripts/run_eval.sh <model-path> <data-dir>
<model-path>
: Specify the path to the model<data-dir>
: Specify the path to the evaluation dataset
To run inference on MME:
sh scripts/run_eval_mme.sh <model-path> <data-dir>
<model-path>
: Specify the path to the model<data-dir>
: Specify the path to the MME dataset
The codebase is built upon LLaVA. We would like to thank the authors for publicly releasing their code.
@article{shen2024multimodal,
title={Multimodal Instruction Tuning with Conditional Mixture of LoRA},
author={Shen, Ying and Xu, Zhiyang and Wang, Qifan and Cheng, Yu and Yin, Wenpeng and Huang, Lifu},
journal={arXiv preprint arXiv:2402.15896},
year={2024}
}