-
Shanghaitech University
- shanghai, China
-
12:26
(UTC +08:00)
Stars
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation, 2024
Collection of AWESOME vision-language models for vision tasks
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Open-source and strong foundation image recognition models.
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
unofficial implementation of Few-Shot Head Swapping in the Wild
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
A simple screen parsing tool towards pure vision based GUI agent
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
A computer vision closed-loop learning platform where code can be run interactively online. 学习闭环《计算机视觉实战演练:算法与应用》中文电子书、源码、读者交流社区(持续更新中 ...) 📘 在线电子书 https://charmve.github.io/computer-vision-in-acti…
COCO 2017 dataset labeled for face detection
This is Pytorch Implementation Code for adding new features in code of Segment-Anything. Here, the features support batch-input on the full-grid prompt (automatic mask generation) with post-process…
Real World Occluded Face dataset containing 3195 neutral images, 1686 sunglasses images and 678 masked images.
The summary of code and paper for salient object detection with deep learning.
DCGM / ffhq-features-dataset
Forked from NVlabs/ffhq-datasetGender, Age, and Emotion for Flickr-Faces-HQ Dataset (FFHQ)
🏂🏻 程序员海外工作/英文面试手册
High-resolution models for human tasks.
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
This repository contains demos I made with the Transformers library by HuggingFace.
Effortless data labeling with AI support from Segment Anything and other awesome models.