Lists (31)
Sort Name ascending (A-Z)
agent
AIGC
awesome list
Blind-SR
codebook
common
网络结构方面的改进conference
会议论文整理搜集dataset
DataSets
diffusion
FFT
Generation
图像生成image blending
K-cluserter
LoRA
meta-learning
MLP
NeRF
NLP&CV
personalization
Remote-sensing
sketch2img
SR
story book
Tools
uncertainty-driven
Unsupervised-SR
video
人种分类
多模态大模型
美颜
Starred repositories
🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
Liquid: Language Models are Scalable and Unified Multi-modal Generators
Implementation of [CVPR 2025] "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
A live stream development of RL tunning for LLM agents
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"
Code for SCIS-2025 Paper "UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation".
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
No fortress, purely open ground. OpenManus is Coming.
Build production-ready AI agents in both Python and Typescript
Wan: Open and Advanced Large-Scale Video Generative Models
Build resilient language agents as graphs.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Solve Visual Understanding with Reinforced VLMs
Lightweight framework for building Agents with memory, knowledge, tools and reasoning.
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
An Agentic Deep Research Assistant similar to Gemini and OpenAI Deep Research
Magic to turn Cursor/Windsurf as 90% of Devin