Skip to content
View xieenze's full-sized avatar

Highlights

  • Pro

Block or report xieenze

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Let's finetune video generation models!

Python 418 15 Updated Feb 24, 2025

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,669 493 Updated Mar 7, 2025

Official repository for LTX-Video

Python 3,126 273 Updated Mar 5, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 3,545 213 Updated Mar 12, 2025

[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 243 7 Updated Jan 22, 2025

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Python 2,068 158 Updated Mar 6, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,002 244 Updated Mar 7, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,914 444 Updated Jan 12, 2025

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,776 87 Updated Oct 31, 2024

润学全球官方指定GITHUB,整理润学宗旨、纲领、理论和各类润之实例;解决为什么润,润去哪里,怎么润三大问题; 并成为新中国人的核心宗教,核心信念。

31,922 2,622 Updated Jul 31, 2024
JavaScript 13 Updated Mar 23, 2024

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Python 777 45 Updated Jul 29, 2024

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

532 52 Updated Dec 31, 2024

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Python 4,469 234 Updated Jun 14, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Python 2,998 189 Updated Oct 31, 2024

Refine high-quality datasets and visual AI models

Python 9,274 606 Updated Mar 13, 2025

🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"

Python 255 19 Updated May 17, 2024

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

Python 321 36 Updated Aug 1, 2023

Efficient Fine-tuning LLaMA Using DiffFit within 0.7M Parameters

Jupyter Notebook 10 2 Updated May 7, 2023

Implementation of "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning"

Python 85 10 Updated Sep 10, 2023

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,183 260 Updated Jan 18, 2025

Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)

Python 3,367 195 Updated Feb 23, 2025
Python 488 72 Updated Feb 21, 2025
Python 158 18 Updated Feb 21, 2025
Python 76 9 Updated Mar 27, 2024

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python 1,310 73 Updated Jan 17, 2024

Inference code for Llama models

Python 57,843 9,714 Updated Jan 26, 2025

Let us control diffusion models!

Python 31,715 2,836 Updated Feb 25, 2024

[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.

Python 1,232 112 Updated Sep 7, 2024
Next