Stars
【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
A Survey on Deepfake Generation and Detection
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tab…
Lumina-T2X is a unified framework for Text to Any Modality Generation
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Open-Sora: Democratizing Efficient Video Production for All
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
Codebase for fine-tuning / evaluating nougat-based image2latex generation models
成员在ICCV、CVPR等CV顶会发表的论文,在ICDAR等比赛中的成果
zigchang / HumanBench
Forked from OpenGVLab/HumanBenchThis repo is official implementation of HumanBench (CVPR2023)
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A pytorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
Unofficial PyTorch implementation of 2D Attentional Irregular Scene Text Recognizer
[ECCV 2018] CCPD: a diverse and well-annotated dataset for license plate detection and recognition
基于人脸关键区域提取的人脸识别(LFW:99.82%+ CFP_FP:98.50%+ AgeDB30:98.25%+)
2019CCF-BDCI大赛 最佳创新探索奖获得者 基于OCR身份证要素提取赛题冠军 天晨破晓团队 赛题源码
A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
wyc2015fq / DewarpNet
Forked from cvlab-stonybrook/DewarpNetCode for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
Train code of face anti-spoofing with a single RGB frame
HED and RCF implementation for edge detection on Tensorflow
ChaLearn Face Anti-spoofing Attack Detection Challenge@CVPR2019