- Baltimore
-
01:03
(UTC -05:00)
Highlights
- Pro
Stars
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
The official PyTorch implementation of the CVPR 2023 paper "Contrastive Grouping with Transformer for Referring Image Segmentation".
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Resources of our paper "FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces". New versions in the making!
VideoAuteur: Towards Long Narrative Video Generation
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Quick scripts to calculate CLIP text-image similarity
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
A general fine-tuning kit geared toward diffusion models.
A generative world for general-purpose robotics & embodied AI learning.
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen, Run Chen, and Julia Hirschberg.
Public code release associated with SceneScript.
Project Aria Social Eye Tracking Model
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
A latent text-to-image diffusion model
[NeurIPS D&B Track 2024] Source code for the paper "Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Codebase for the paper-Elucidating the design space of language models for image generation