Stars
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Complete Open Source and Modular solution for MMO
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Python scripts for the Segment Anythin 2 (SAM2) model in ONNX
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
The modified differential Gaussian rasterization in the CVPR 2024 highlight paper: GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting.
[CVPR 2024] Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Algorithm to texture 3D reconstructions from multi-view stereo images
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
A collaboration friendly studio for NeRFs
Vector Quantized VAEs - PyTorch Implementation
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).
This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"
Teaching robots to respond to open-vocab queries with CLIP and NeRF-like neural fields
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Community for applying LLMs to robotics and a robot simulator with ChatGPT integration
Most popular metrics used to evaluate object detection algorithms.