-
The University of Hong Kong
- Hong Kong
- https://xieenze.github.io/
Highlights
- Pro
Stars
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Refine high-quality datasets and visual AI models
🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Efficient Fine-tuning LLaMA Using DiffFit within 0.7M Parameters
Implementation of "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning"
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.