-
University of Lincoln
- Lincoln
- https://agriforwards-cdt.blogs.lincoln.ac.uk/cdt-personal/xumin-gao/
Stars
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
Meta-AI SAM + AMG + Average of Patch-Embedding per instance segment + Clustering = Semantic Segmentation
A low-cost AI powered robotic arm assistant that listens to your voice commands and can carry out a variety of tabletop tasks.
[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
Uncertainty estimation for anchor-based deep object detectors.
Code for our paper titled: "A Review and Comparative Study on Probabilistic Object Detection in Autonomous Driving"
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Tactile Sensing and Simulation; Visual Tactile Manipulation; Open Source.
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
[ACM MM23] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
Includes FSC-147-D and the code for training and testing the CounTX model from the paper Open-world Text-specified Object Counting.
Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.
Related papers and codes for vision-based robotic grasping
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[CVPR 2024 Highlight] Official PyTorch implementation of SpatialTracker: Tracking Any 2D Pixels in 3D Space
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
The Arcade Learning Environment (ALE) -- a platform for AI research.