Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025, $${\color{red}Highlight}$$)

Official implementation of "Olympus: A Universal Task Router for Computer Vision Tasks"

♥️ If you find our project is helpful for your research, please kindly give us a 🌟 and cite our paper 📑 : )

📣 News

Release the code for integration with task-specific models.
Release the training & inference code.
Release Olympus datasets.
Release the model of Olympus.

🔅 Overview

Getting Started

🛠️ Environment Installation

To establish the environment, just run this code in the shell:

git clone https://github.com/yuanze-lin/Olympus.git
cd Olympus
conda create -n olympus python==3.10 -y
conda activate olympus
pip install -r requirements.txt

That will create the environment olympus we used.

Download Models & Data

We share our collected Olympus dataset as follows:

Instruction	Link
Olympus Dataset	Olympus_dataset
Olympus Fine-tuning Data	Olympus.json

Olympus_dataset: There are 20 JSON files under 20 individual tasks folder, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in coa.json. Each of these 21 JSON files includes both training and test data. OlympusInstruct.json and OlympusBench.json contain the collected OlympusInstruct and OlympusBench datasets, respectively.
Olympus.json: The final instruction data for fine-tuning.

(1) Download the Olympus model:

python download_olympus.py

It will save the Olympus model under the ckpts folder.

(2) Download the Olympus data for fine-tuning:

python download_olympus_dataset.py

It saves the fine-tuning instruction data Olympus.json to the train_data folder, while all other JSON files are stored in the newly created jsons folder. Note that Olympus.json is a combination of llava_v1_5_mix665k.json and OlympusInstruct, our collected instruction data covering 20 tasks.

If you want to merge the data manually, download llava_v1_5_mix665k.json into the jsons folder, then run the merge script:

python scripts/merge_data.py

You can specify which tasks to merge by referring to the script scripts/merge_tasks.py.

(3) Download the Mipha-3B model for fine-tuning:

python download_mipha_3b.py

It will save the Mipha-3B model under the ckpts folder.

Inference

Run the following code for inference:

model_name=Olympus
MODELDIR=ckpts/$model_name

python predict.py \
  --prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
In the following step, produce a high-resolution 3D model based on the modified image. \
At the next point, please show a video of a cat and a dog running on a playground." \
  --model-path $MODELDIR \
  --temperature 0 \
  --conv-mode v0

Alternatively, you can run bash predict.sh as we did.

The prediction should be like:

Input Prompt:  Generate an image of a fluffy orange cat lounging on a windowsill,
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
In the following step, produce a high-resolution 3D model based on the modified image.
At the next point, please show a video of a cat and a dog running on a playground.

Output:  <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
<image_edit>change the cat's color to white.</image_edit>
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
<video_gen>a cat and a dog running on a playground.</video_gen>

Change the --prompt to customize the input prompt as needed.

Visual Instruction Tuning

Please refer here to prepare the instruction tuning data. Especially, store the images from different datasets under train_data folder.

Run the following code to fine-tune the model:

bash scripts/mipha/finetune.sh

Evaluation

To evaluate the model's performance on different benchmarks:

See Evaluation.md.

Please place the evaluation data under the eval folder. The evaluation scripts are placed under scripts/mipha/eval/. For example, to test the model's performance on VQAv2 dataset, simply run:

bash scripts/mipha/eval/vqav2.sh

🔮 Suppored Capacities (Covering 20 tasks)

🏂 Diverse Applications

Citation

If you find Olympus useful for your research and applications, please cite using this BibTeX:

@article{lin2024olympus,
  title={Olympus: A Universal Task Router for Computer Vision Tasks},
  author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
  journal={arXiv preprint arXiv:2412.09612},
  year={2024}
}

Acknowledgement

Our project is built upon the following foundations:

Mipha: An impressive open-source project for lightweight vision-language assistants
LLaVA: A powerful open-source vision-language assistant project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025, $${\color{red}Highlight}$$)

📣 News

🔅 Overview

Getting Started

🛠️ Environment Installation

Download Models & Data

Inference

Visual Instruction Tuning

Evaluation

🔮 Suppored Capacities (Covering 20 tasks)

🏂 Diverse Applications

Citation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
asset		asset
ckpts		ckpts
eval		eval
mipha		mipha
scripts		scripts
train_data		train_data
wandb		wandb
README.md		README.md
download_mipha_3b.py		download_mipha_3b.py
download_olympus.py		download_olympus.py
download_olympus_dataset.py		download_olympus_dataset.py
predict.py		predict.py
predict.sh		predict.sh
requirements.txt		requirements.txt

yuanze-lin/Olympus

Folders and files

Latest commit

History

Repository files navigation

Olympus: A Universal Task Router for Computer Vision Tasks (CVPR 2025, $${\color{red}Highlight}$$)

📣 News

🔅 Overview

Getting Started

🛠️ Environment Installation

Download Models & Data

Inference

Visual Instruction Tuning

Evaluation

🔮 Suppored Capacities (Covering 20 tasks)

🏂 Diverse Applications

Citation

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages