Starred repositories
An Open-source Toolkit for LLM Development
ComfyUI's ControlNet Auxiliary Preprocessors
Train high-quality text-to-image diffusion models in a data & compute efficient manner
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
VideoSys: An easy and efficient system for video generation
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Large-scale text-video dataset. 10 million captioned short videos.
Character Animation (AnimateAnyone, Face Reenactment)
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
搞定C++:punch:。C++ Primer 中文版第5版学习仓库,包括笔记和课后练习答案。
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
[CSUR] A Survey on Video Diffusion Models
Generative Models by Stability AI
✨✨Latest Advances on Multimodal Large Language Models
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
AAAI 2024: Visual Instruction Generation and Correction