Skip to content
View DAVID-Hown's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report DAVID-Hown

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A collection of resources and papers on Diffusion Models

HTML 11,338 955 Updated Aug 1, 2024

👀 Visual Instruction Inversion: Image Editing via Visual Prompting (NeurIPS 2023)

Python 88 2 Updated Dec 19, 2023

A collection of resources on controllable generation with text-to-image diffusion models.

970 27 Updated Dec 31, 2024

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,247 196 Updated Jan 6, 2025

animatediff prompt travel

Python 1,194 104 Updated Jan 13, 2024

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

Jupyter Notebook 5,552 351 Updated Jun 28, 2024

Create images of a given character in different poses

Python 626 65 Updated Jun 5, 2024

Focus on prompting and generating

Python 42,685 6,248 Updated Jan 14, 2025

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Python 2 Updated Jun 23, 2023

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Python 5,982 852 Updated May 13, 2024

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,500 1,233 Updated Jul 23, 2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Python 350 15 Updated Jan 14, 2025
33 Updated Jan 10, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,589 575 Updated Jan 11, 2025

Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)

Python 121 4 Updated Nov 13, 2023

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,823 524 Updated Dec 25, 2024

Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated)

Jupyter Notebook 1,666 150 Updated Mar 13, 2024

[AAAI 2024 Oral] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Python 855 107 Updated Dec 20, 2023

The Codes and Data of The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Python 49 3 Updated Jan 8, 2025

③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

Python 336 24 Updated Aug 12, 2024

Generative Models by Stability AI

Python 25,107 2,783 Updated Sep 4, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,534 2,926 Updated Sep 2, 2024

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 6,121 616 Updated Sep 26, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,237 259 Updated Jan 11, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,514 4,592 Updated Jan 18, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,323 404 Updated Aug 7, 2024

An Open-source Toolkit for LLM Development

Python 2,747 176 Updated Jan 13, 2025

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Python 787 109 Updated Jun 30, 2021
Next