Starred repositories
Agent-to-Sim Learning Interactive Behavior from Casual Videos.
Large Concept Models: Language modeling in a sentence representation space
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Extended LaTeX template for CVPR/ICCV papers
Bringing BERT into modernity via both architecture changes and scaling
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation
[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
Unified KV Cache Compression Methods for Auto-Regressive Models
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
Official implementation of the paper "Watermark Anything with Localized Messages"
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"
The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"