Stars
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
API and library for MAVLink compatible systems written in C++17
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
A latent text-to-image diffusion model
COYO-700M: Large-scale Image-Text Pair Dataset
An open source implementation of CLIP.
FFCV: Fast Forward Computer Vision (and other ML workloads!)
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code.
The correct way to resize images or tensors. For Numpy or Pytorch (differentiable).
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A data augmentations library for audio, image, text, and video.
torch-optimizer -- collection of optimizers for Pytorch
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
This repository contains demos I made with the Transformers library by HuggingFace.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A topic-centric list of HQ open datasets.
Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two
An ultrafast memory-efficient short read aligner
Is the attention layer even necessary? (https://arxiv.org/abs/2105.02723)
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference
Dataset of frogs on a white background for ML experiments