From 0333b1fbb29c31809663c4e6de2654b9ff2d27de Mon Sep 17 00:00:00 2001 From: sachinmehta Date: Thu, 25 Apr 2024 10:08:47 -0700 Subject: [PATCH] Added CatLIP paper links Done on 2024-04-25 based on 8db2c173f3d8af69eed7c85b94ed2655f8931fa1 --- README.md | 5 ++--- projects/catlip/README.md | 12 ++++++++++-- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 22269f7..0ce70a7 100644 --- a/README.md +++ b/README.md @@ -23,11 +23,10 @@ CoreNet is a deep neural network toolkit that allows researchers and engineers t ## Research efforts at Apple using CoreNet -Below is the list of publications from Apple that uses CoreNet: +Below is the list of publications from Apple that uses CoreNet. Also, training and evaluation recipes, as well as links to pre-trained models, can be found inside the [projects](./projects/) folder. Please refer to it for further details. * [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619) - - * [CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data]() + * [CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data](https://arxiv.org/abs/2404.15653) * [Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement](https://arxiv.org/abs/2303.08983) * [CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement](https://arxiv.org/abs/2310.14108) * [FastVit: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://arxiv.org/abs/2303.14189) diff --git a/projects/catlip/README.md b/projects/catlip/README.md index c1dae39..254af58 100644 --- a/projects/catlip/README.md +++ b/projects/catlip/README.md @@ -1,6 +1,7 @@ # CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data +[![arXiv](https://img.shields.io/badge/arXiv-2404.15653-a6dba0.svg)](https://arxiv.org/abs/2404.15653) -[CatLIP]() introduces a novel weakly supervised pre-training approach for vision models on web-scale noisy image-text data, *reframing pre-training as a classification task to circumvent computational challenges associated with pairwise similarity computations in contrastive learning*, resulting in a significant 2.7x acceleration in training speed while maintaining high representation quality across various vision tasks. +`CatLIP` introduces a novel weakly supervised pre-training approach for vision models on web-scale noisy image-text data, *reframing pre-training as a classification task to circumvent computational challenges associated with pairwise similarity computations in contrastive learning*, resulting in a significant 2.7x acceleration in training speed while maintaining high representation quality across various vision tasks. We provide training and evaluation code along with pretrained models and configuration files for the following tasks: @@ -15,7 +16,14 @@ We provide training and evaluation code along with pretrained models and configu If you find our work useful, please cite: ```BibTex -# TODO(sachin): Add CatLIP citation +@article{mehta2024catlip, + title={CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data}, + author={Sachin Mehta and Maxwell Horton and Fartash Faghri and Mohammad Hossein Sekhavat and Mahyar Najibi and Mehrdad Farajtabar and Oncel Tuzel and Mohammad Rastegari}, + year={2024}, + eprint={2404.15653}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} @inproceedings{mehta2022cvnets, author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad},