Stars
Start building LLM-empowered multi-agent applications in an easier way.
Infinite Photorealistic Worlds using Procedural Generation
« usbkill » is an anti-forensic kill-switch that waits for a change on your USB ports and then immediately shuts down your computer.
Solve Visual Understanding with Reinforced VLMs
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS. Also, including other vision transformers and CNNs for satellite, aerial image …
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local Large Language Model (LLM) via Ollama. This tool enables you to enhance your image generat…
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
[AAAI 2025] Official PyTorch implementation of "TinySAM: Pushing the Envelope for Efficient Segment Anything Model"
ArcGIS Python Toolbox for WhiteboxTools
Official code of Remote Sensing Mamba
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
[CVPR 2025] SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images