Stars
「ICLR 2025」 A Sanity Check for AI-generated Image Detection
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
The public code for "PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts"
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Logo detection using YOLOv7 with LogoDet-3K and Flickr Logos 27.
A Brand Independent logo detection model
Brand Visibility in Packaging: A Deep Learning Approach for Logo Detection, Saliency-Map Prediction, and Logo Placement Analysis
A comprehensive collection of IQA papers
ImageBind One Embedding Space to Bind Them All
DataComp: In search of the next generation of multimodal datasets
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
The unofficial python package that returns response of Google Bard through cookie value.
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
✨✨Latest Advances on Multimodal Large Language Models
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
YoungerGao / Bert-Chinese-Text-Classification-Pytorch
Forked from 649453932/Bert-Chinese-Text-Classification-Pytorch使用Bert,ERNIE,进行中文文本分类
VRT: A Video Restoration Transformer (official repository)