Highlights
- Pro
Stars
Wan: Open and Advanced Large-Scale Video Generative Models
Video-R1: Towards Super Reasoning Ability in Video Understanding MLLMs
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Understand Human Behavior to Align True Needs
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Materials for the Hugging Face Diffusion Models Course
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
Refine high-quality datasets and visual AI models
The simplest, fastest repository for training/finetuning medium-sized GPTs.
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
LenslessFace : An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification
[CVPR 2025] StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
A PyTorch native library for large model training
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
[CSUR] A Survey on Video Diffusion Models
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Collection of awesome test-time (domain/batch/instance) adaptation methods
✨✨Latest Advances on Multimodal Large Language Models
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Meta-Transformer for Unified Multimodal Learning