Highlights
- Pro
Stars
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
Janus-Series: Unified Multimodal Understanding and Generation Models
Implementation of rectified flow and some of its followup research / improvements in Pytorch
Efficient vision foundation models for high-resolution generation and perception.
Rectified Diffusion: Straightness Is Not Your Need
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Official PyTorch implementation of AdaFlow
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
"SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow", Yuanzhi Zhu, Xingchao Liu, Qiang Liu
An open-source toolbox for fast sampling of diffusion models. Official implementations of our works published in ICML, NeurIPS, CVPR.
(CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Mixed continous/categorical flow-matching model for de novo molecule generation.
[NeurIPS 2024] RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
Open-Sora: Democratizing Efficient Video Production for All
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
🔥[IJCAI 2022, Official Code] for paper "Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向多主题场景的美学评估数据集、算法和benchmark.
Official code for paper: Text-to-Image Rectified Flow as Plug-and-Play Priors
The codebase of our paper "Improving the Training of Rectified Flows", NeurIPS 2024
[ICLR 2024] Official implementation of Bellman Optimal Stepsize Straightening of Flow-Matching Models
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Denoising Diffusion Step-aware Models (ICLR2024)
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text