Skip to content

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

Notifications You must be signed in to change notification settings

yuanze-lin/Olympus

Repository files navigation

icon

Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025, $${\color{red}Highlight}$$)

PDF arXiv Project Page Weights Dataset

Official implementation of "Olympus: A Universal Task Router for Computer Vision Tasks"

♥️ If you find our project is helpful for your research, please kindly give us a 🌟 and cite our paper 📑 : )

📣 News

  • Release the code for integration with task-specific models.
  • Release the training & inference code.
  • Release Olympus datasets.
  • Release the model of Olympus.

🔅 Overview

image

Getting Started

🛠️ Environment Installation

To establish the environment, just run this code in the shell:

git clone https://github.com/yuanze-lin/Olympus.git
cd Olympus
conda create -n olympus python==3.10 -y
conda activate olympus
pip install -r requirements.txt

That will create the environment olympus we used.

Download Models & Data

We share our collected Olympus dataset as follows:

Instruction Link
Olympus Dataset Olympus_dataset
Olympus Fine-tuning Data Olympus.json
  • Olympus_dataset: There are 20 JSON files under 20 individual tasks folder, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in coa.json. Each of these 21 JSON files includes both training and test data. OlympusInstruct.json and OlympusBench.json contain the collected OlympusInstruct and OlympusBench datasets, respectively.
  • Olympus.json: The final instruction data for fine-tuning.

(1) Download the Olympus model:

python download_olympus.py

It will save the Olympus model under the ckpts folder.

(2) Download the Olympus data for fine-tuning:

python download_olympus_dataset.py

It saves the fine-tuning instruction data Olympus.json to the train_data folder, while all other JSON files are stored in the newly created jsons folder. Note that Olympus.json is a combination of llava_v1_5_mix665k.json and OlympusInstruct, our collected instruction data covering 20 tasks.

If you want to merge the data manually, download llava_v1_5_mix665k.json into the jsons folder, then run the merge script:

python scripts/merge_data.py

You can specify which tasks to merge by referring to the script scripts/merge_tasks.py.

(3) Download the Mipha-3B model for fine-tuning:

python download_mipha_3b.py

It will save the Mipha-3B model under the ckpts folder.

Inference

Run the following code for inference:

model_name=Olympus
MODELDIR=ckpts/$model_name

python predict.py \
  --prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
In the following step, produce a high-resolution 3D model based on the modified image. \
At the next point, please show a video of a cat and a dog running on a playground." \
  --model-path $MODELDIR \
  --temperature 0 \
  --conv-mode v0

Alternatively, you can run bash predict.sh as we did.

The prediction should be like:

Input Prompt:  Generate an image of a fluffy orange cat lounging on a windowsill,
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
In the following step, produce a high-resolution 3D model based on the modified image.
At the next point, please show a video of a cat and a dog running on a playground.

Output:  <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
<image_edit>change the cat's color to white.</image_edit>
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
<video_gen>a cat and a dog running on a playground.</video_gen>

Change the --prompt to customize the input prompt as needed.

Visual Instruction Tuning

Please refer here to prepare the instruction tuning data. Especially, store the images from different datasets under train_data folder.

Run the following code to fine-tune the model:

bash scripts/mipha/finetune.sh

Evaluation

To evaluate the model's performance on different benchmarks:

See Evaluation.md.

Please place the evaluation data under the eval folder. The evaluation scripts are placed under scripts/mipha/eval/. For example, to test the model's performance on VQAv2 dataset, simply run:

bash scripts/mipha/eval/vqav2.sh

🔮 Suppored Capacities (Covering 20 tasks)

image

🏂 Diverse Applications

image

Citation

If you find Olympus useful for your research and applications, please cite using this BibTeX:

@article{lin2024olympus,
  title={Olympus: A Universal Task Router for Computer Vision Tasks},
  author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
  journal={arXiv preprint arXiv:2412.09612},
  year={2024}
}

Acknowledgement

Our project is built upon the following foundations:

  • Mipha: An impressive open-source project for lightweight vision-language assistants
  • LLaVA: A powerful open-source vision-language assistant project