Stars
Generic Keyboard Teleop for ROS
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
The repository provides code associated with the paper VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation (ICRA 2024)
Democratization of RT-2 "RT-2: New model translates vision and language into action"
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Dobb·E: An open-source, general framework for learning household robotic manipulation
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
✨✨Latest Advances on Multimodal Large Language Models
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
A large-scale benchmark and learning environment.
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Simple and easily configurable grid world environments for reinforcement learning
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Code for the Paper M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models.
AI2-THOR Data Collection Tool Based On Keyboard Interaction
[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling better-reasoned decision-making for daily task planning problems.
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
A generative world for general-purpose robotics & embodied AI learning.
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"