Starred repositories
A collection of resources and papers on Diffusion Models
👀 Visual Instruction Inversion: Image Editing via Visual Prompting (NeurIPS 2023)
A collection of resources on controllable generation with text-to-image diffusion models.
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
animatediff prompt travel
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Create images of a given character in different poses
dilithjay / Sinhala-ParSeq
Forked from baudm/parseqScene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated)
[AAAI 2024 Oral] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
The Codes and Data of The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
Generative Models by Stability AI
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
An Open-source Toolkit for LLM Development
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"