Skip to content

Commit

Permalink
Added CatLIP paper links
Browse files Browse the repository at this point in the history
Done on 2024-04-25 based on 8db2c173f3d8af69eed7c85b94ed2655f8931fa1
  • Loading branch information
sachinmehta committed Apr 25, 2024
1 parent 343b6bd commit 0333b1f
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 5 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,10 @@ CoreNet is a deep neural network toolkit that allows researchers and engineers t

## Research efforts at Apple using CoreNet

Below is the list of publications from Apple that uses CoreNet:
Below is the list of publications from Apple that uses CoreNet. Also, training and evaluation recipes, as well as links to pre-trained models, can be found inside the [projects](./projects/) folder. Please refer to it for further details.

* [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619)
<!-- TODO: add url for CatLIP -->
* [CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data]()
* [CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data](https://arxiv.org/abs/2404.15653)
* [Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement](https://arxiv.org/abs/2303.08983)
* [CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement](https://arxiv.org/abs/2310.14108)
* [FastVit: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://arxiv.org/abs/2303.14189)
Expand Down
12 changes: 10 additions & 2 deletions projects/catlip/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
[![arXiv](https://img.shields.io/badge/arXiv-2404.15653-a6dba0.svg)](https://arxiv.org/abs/2404.15653)

[CatLIP]() introduces a novel weakly supervised pre-training approach for vision models on web-scale noisy image-text data, *reframing pre-training as a classification task to circumvent computational challenges associated with pairwise similarity computations in contrastive learning*, resulting in a significant 2.7x acceleration in training speed while maintaining high representation quality across various vision tasks.
`CatLIP` introduces a novel weakly supervised pre-training approach for vision models on web-scale noisy image-text data, *reframing pre-training as a classification task to circumvent computational challenges associated with pairwise similarity computations in contrastive learning*, resulting in a significant 2.7x acceleration in training speed while maintaining high representation quality across various vision tasks.

We provide training and evaluation code along with pretrained models and configuration files for the following tasks:

Expand All @@ -15,7 +16,14 @@ We provide training and evaluation code along with pretrained models and configu
If you find our work useful, please cite:

```BibTex
# TODO(sachin): Add CatLIP citation
@article{mehta2024catlip,
title={CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data},
author={Sachin Mehta and Maxwell Horton and Fartash Faghri and Mohammad Hossein Sekhavat and Mahyar Najibi and Mehrdad Farajtabar and Oncel Tuzel and Mohammad Rastegari},
year={2024},
eprint={2404.15653},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@inproceedings{mehta2022cvnets,
author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad},
Expand Down

0 comments on commit 0333b1f

Please sign in to comment.