Stars
OpenEMMA, a permissively licensed open source reproduction of Waymo’s EMMA model.
Official repo for "VisionZip: Longer is Better but Not Necessary in Vision Language Models"
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Inpaint images with ControlNet
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
[NeurIPS 2024 D&B] Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Roadmap to become a Visual-SLAM developer in 2023
Highly recommended resources for SLAM newbies (Lecture, Reviewed paper, Books, Tutorial, etc)
urbste / ORB_SLAM3
Forked from UZ-SLAMLab/ORB_SLAM3ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
Tools to distill the Hiera transformer backbone to CNNs that are easier to deploy on the edge.
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
A curated list of awesome SLAM tutorials, projects and communities.
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
EDM2 and Autoguidance -- Official PyTorch implementation
High-resolution models for human tasks.
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
An example RLDS dataset builder for X-embodiment dataset conversion.
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…