Stars
Solve Visual Understanding with Reinforced VLMs
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Code for running baseline models/experiments with the Fields of The World dataset
[CVPR 2025] SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Geospatial library wheels for Python on Windows.
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS. Also, including other vision transformers and CNNs for satellite, aerial image …
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local Large Language Model (LLM) via Ollama. This tool enables you to enhance your image generat…
Official code of Remote Sensing Mamba
[NeurIPS 2024] Code release for "Segment Anything without Supervision"
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Start building LLM-empowered multi-agent applications in an easier way.
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.
[AAAI 2025] Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
« usbkill » is an anti-forensic kill-switch that waits for a change on your USB ports and then immediately shuts down your computer.
Infinite Photorealistic Worlds using Procedural Generation
ArcGIS Python Toolbox for WhiteboxTools
Cesium development template based on vueCli 4.x.x + and electron 6.x.x +
WebGL point cloud viewer for large datasets