Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
The official code for "How Control Information Influences Multilingual Text Image Generation and Editing?"
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Official inference repo for FLUX.1 models
MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
salaniz / pycocoevalcap
Forked from tylin/coco-captionPython 3 support for the MS COCO caption evaluation tools
Lumina-T2X is a unified framework for Text to Any Modality Generation
Open-Sora: Democratizing Efficient Video Production for All
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
Diffusion model papers, survey, and taxonomy
VideoSys: An easy and efficient system for video generation
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
Materials for the Hugging Face Diffusion Models Course
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Curated list of awesome resources for the Stable Diffusion AI Model.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
a state-of-the-art-level open visual language model | 多模态预训练模型
Generative Models by Stability AI