-
The Hong Kong University of Science and Technology (Guangzhou)
- Guangzhou, Guangdong, China
- https://scholar.google.com.hk/citations?user=hmUOaNcAAAAJ&hl=zh-CN
Starred repositories
GenEval: An object-focused framework for evaluating text-to-image alignment
Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
The official implementation of Distribution Backtracking Distillation for One-step Diffusion Models
the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
[CVPR 2024] On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
A generative world for general-purpose robotics & embodied AI learning.
FastVideo is a lightweight framework for accelerating large video diffusion models.
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
collection of diffusion model papers categorized by their subareas
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
A high-throughput and memory-efficient inference and serving engine for LLMs
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…
A summary of related works about flow matching, stochastic interpolants
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
The code of our work "Golden Noise for Diffusion Models: A Learning Framework".