Stars
✨✨Latest Advances on Multimodal Large Language Models
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-contex…
Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
关于domain generalization,domain adaptation,causality,robutness,prompt,optimization,generative model各式各样研究的阅读笔记
[CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
A curated list of papers, code and resources pertaining to few-shot image generation.
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting
A collection of resources on controllable generation with text-to-image diffusion models.
[CVPR 2024] Official PyTorch implementation of "ECLIPSE: Revisiting the Text-to-Image Prior for Efficient Image Generation"
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)
[TMLR] Official PyTorch implementation of "λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space"
Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)
Official codebase for the Paper “Retrieval-Augmented Diffusion Models”
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
Retrieval augmented diffusion from CompVis.
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
A library for efficient similarity search and clustering of dense vectors.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
OpenMMLab Detection Toolbox and Benchmark
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
This is the first Chinese chat model specifically fine-tuned for Chinese through ORPO based on the Meta-Llama-3-8B-Instruct model.
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.