Skip to content

Latest commit

 

History

History

vitamin_fcclip

ViTamin for Open-Vocabulary Segmentation

The folder includes the implementation of ViTamin for open-vocabulary segmentation.

We propose Sliding FC-CLIP which adapts ViTamin within the FC-CLIP framework. Thanks, FC-CLIP!

Installation and Getting Started

Please follows FC-CLIP's installation instructions and Getting Started with FC-CLIP.

The configuration for ViTamin is provided in "./configs/coco/panoptic-segmentation/fcclip/fcclip_vitamin_l_eval_ade20k.yaml"

🔥 Model Zoo

image encoder ADE20K(A-150) Cityscapes Mapillary Vistas ADE20K
(A-150)
ADE20K-Full
(A-847)
Pascal Context 459
(PC-459)
Pascal Context 59
(PC-59)
Pascal VOC 21
(PAS-21)
download
PQ PQ PQ mIoU mIoU mIoU mIoU mIoU
ConvNeXt-L 26.8 44.0 18.3 34.1 14.8 18.2 58.4 81.8 checkpoint
ViT-L/14 24.6 40.7 16.5 31.8 14.3 18.3 55.1 81.5
ViTamin-L 27.3 44.0 18.2 35.6 16.1 20.4 58.4 83.4 checkpoint

Citing ViTamin

@inproceedings{chen2024vitamin,
  title={ViTamin: Designing Scalable Vision Models in the Vision-language Era},
  author={Chen, Jieneng and Yu, Qihang and Shen, Xiaohui and Yuille, Alan and Chen, Liang-Chieh},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Original FC-CLIP README

This repo contains thr code for our paper Convolutions Die Hard: Open-Vocabulary Panoptic Segmentation with Single Frozen Convolutional CLIP


FC-CLIP is an universal model for open-vocabulary image segmentation problems, consisting of a class-agnostic segmenter, in-vocabulary classifier, out-of-vocabulary classifier. With everything built upon a shared single frozen convolutional CLIP model, FC-CLIP not only achieves state-of-the-art performance on various open-vocabulary segmentation benchmarks, but also enjoys a much lower training (3.2 days with 8 V100) and testing costs compared to prior arts.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for FC-CLIP.

See Getting Started with FC-CLIP.

We also support FC-CLIP with HuggingFace 🤗 Demo

Model Zoo

ADE20K(A-150) Cityscapes Mapillary Vistas ADE20K-Full
(A-847)
Pascal Context 59
(PC-59)
Pascal Context 459
(PC-459)
Pascal VOC 21
(PAS-21)
Pascal VOC 20
(PAS-20)
COCO
(training dataset)
download
PQ mAP mIoU PQ mAP mIoU PQ mIoU mIoU mIoU mIoU mIoU mIoU PQ mAP mIoU
FC-CLIP 26.8 16.8 34.1 44.0 26.8 56.2 18.3 27.8 14.8 58.4 18.2 81.8 95.4 54.4 44.6 63.7 checkpoint

Citing FC-CLIP

If you use FC-CLIP in your research, please use the following BibTeX entry.

@inproceedings{yu2023fcclip,
  title={Convolutions Die Hard: Open-Vocabulary Panoptic Segmentation with Single Frozen Convolutional CLIP},
  author={Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen},
  journal={arXiv},
  year={2023}
}

Acknowledgement

Mask2Former (https://github.com/facebookresearch/Mask2Former)

ODISE (https://github.com/NVlabs/ODISE)