Stars
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
An offical repo for ECCV 2024 Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Cornell Touchdown natural language navigation and spatial reasoning dataset.
Code for RSS2018 paper on the Grounded Semantic Mapping Network
[RSS 2024] NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
The repository provides code associated with the paper VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation (ICRA 2024)
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
Code for the habitat challenge
EventGPT: Event Stream Understanding with Multimodal Large Language Models
Vision-and-Language Navigation in Continuous Environments using Habitat
Ideas and thoughts about the fascinating Vision-and-Language Navigation
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Get up and running with Llama 3.3, Phi 4, Gemma 2, and other large language models.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or laws in the future
a LLM cookbook, for building your own from scratch, all the way from gathering data to training a model
MobiLlama : Small Language Model tailored for edge devices