OFA

OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework.

News

2022.2.15: Released finetuning & inference code/checkpoints for referring expression comprehension, as well as a Colab notebook and a demo in Hugging Face Spaces .
2022.2.13: Released the demo of image captioning. Have fun!
2022.2.11: Released the Colab notebook for image captioning . Enjoy!
2022.2.11: Released the pretrained checkpoint of OFA-Large and the complete (2-staged) finetuning code for image captioning.
2022.2.10: Released the inference code & finetuned checkpoint for image captioning, which can reproduce the results on COCO Karparthy test split (149.6 CIDEr). OFA also achieves No.1 on the COCO image captioning online leaderboard [Link] (marked as M6-Team)

TODO

To release finetuning and inference codes for multimodal downstream tasks soon, including image captioning, VQA, text-to-image generation, SNLI-VE, Referring expression, comprehension, etc.
To release codes for pretraining soon.

Approach

Requirements

python 3.7.4
pytorch 1.8.1
torchvision 0.9.1
JAVA 1.8 (for COCO evaluation)

Installation

git clone https://github.com/OFA-Sys/OFA
pip install -r requirements.txt

Datasets and Checkpoints

See datasets.md and checkpoints.md.

Pretraining

To release soon:)

Finetuning & Inference

Below we provide methods for fintuning and inference on different downstream tasks.

Caption

Download data (see datasets.md) and models (see checkpoints.md) and put them in the correct directory
Train

cd run_scripts/caption
nohup sh train_caption_stage1.sh > train_stage1.out &  # stage1, train with cross-entropy loss
nohup sh train_caption_stage2.sh > train_stage2.out &  # stage2, load the best ckpt of stage1 and train with CIDEr optimization

Inference

cd run_scripts/caption ; sh evaluate_caption.sh  # inference & evaluate

Referring Expression Comprehension

Download data (see datasets.md) and models (see checkpoints.md) and put them in the correct directory
Train

cd run_scripts/refcoco
nohup sh train_refcoco.sh > train_refcoco.out &  # finetune for refcoco
nohup sh train_refcocoplus.sh > train_refcocoplus.out &  # finetune for refcoco+
nohup sh train_refcocog.sh > train_refcocog.out &  # finetune for refcocog

Inference

cd run_scripts/refcoco ; sh evaluate_refcoco.sh  # inference & evaluate for refcoco/refcoco+/refcocog

Gallery

Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).

Text-to-Image Generation (normal query)

Text-to-Image Generation (counterfactual query)

Open-Ended VQA

Grounded QA (unseen task)

Viusal Grounding (unseen domain)

Citation

Please cite our paper if you find it helpful :)

@article{wang2022OFA,
  title={Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework},
  author={Wang, Peng and Yang, An and Men, Rui and Lin, Junyang and Bai, Shuai and Li, Zhikang and Ma, Jianxin and Zhou, Chang and Zhou, Jingren and Yang, Hongxia},
  journal={arXiv e-prints},
  pages={arXiv--2202},
  year={2022}
}

Related Codebase

Fairseq

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
criterions		criterions
data		data
examples		examples
fairseq		fairseq
models		models
notebooks		notebooks
ofa_module		ofa_module
run_scripts		run_scripts
tasks		tasks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checkpoints.md		checkpoints.md
colab.md		colab.md
datasets.md		datasets.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt
spaces.md		spaces.md
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OFA

News

TODO

Approach

Requirements

Installation

Datasets and Checkpoints

Pretraining

Finetuning & Inference

Caption

Referring Expression Comprehension

Gallery

Text-to-Image Generation (normal query)

Text-to-Image Generation (counterfactual query)

Open-Ended VQA

Grounded QA (unseen task)

Viusal Grounding (unseen domain)

Citation

Related Codebase

License

About

Releases

Packages

Languages

License

jinze1994/OFA

Folders and files

Latest commit

History

Repository files navigation

OFA

News

TODO

Approach

Requirements

Installation

Datasets and Checkpoints

Pretraining

Finetuning & Inference

Caption

Referring Expression Comprehension

Gallery

Text-to-Image Generation (normal query)

Text-to-Image Generation (counterfactual query)

Open-Ended VQA

Grounded QA (unseen task)

Viusal Grounding (unseen domain)

Citation

Related Codebase

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages