Stars
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
An extremely simple method for validation-free efficient adaptation of CLIP-like VLMs that is robust to the learning rate.
A generative world for general-purpose robotics & embodied AI learning.
Official PyTorch implementation - Video Motion Transfer with Diffusion Transformers
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
SANE is a zero-shot inference pipeline for improving diffusion models with LLM reasoning.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
[ECCV 2024] BrainHub: Multimodal Brain Understanding Benchmark
Openness Taxonomy of Large Language Models
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
This is the official repo of the paper "Latent Guard: a Safety Framework for Text-to-image Generation"
[ICLR 2025] HQ-Edit: A High-Quality and High-Coverage Dataset for General Image Editing
[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
[CVPR 2024 Oral, Best Paper Award Candidate] Official repository of "PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness"
Open-Sora: Democratizing Efficient Video Production for All
Original code base for On Pretraining Data Diversity for Self-Supervised Learning
DeepLens: A differentiable lens simulator for end-to-end computational cameras.
Official Code for Stable Cascade
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
InstructPix2Pix with distilled diffusion models
Learning from synthetic data - code and models
Scene-Conditional 3D Object Stylization and Composition (ECCV 2024)
collection of diffusion model papers categorized by their subareas
[CVPR 2024] Official repository of "A Simple Recipe for Language-guided Domain Generalized Segmentation"
[CVPR 2024] Official repository of "Material Palette: Extraction of Materials from a Single Real-world Image"