-
Shanghai AI Lab
- Shanghai
- https://siyuanhuang95.github.io/
- https://scholar.google.com/citations?user=QNkS4KEAAAAJ&hl=en
Stars
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
A Benchmark for Evaluating Generalization for Robotic Manipulation
Implementation of a framework for Genie2 in Pytorch
[ICLR 2025 Oral] Seer: Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
上海交通大学 Beamer 模版 | Beamer template for Shanghai Jiao Tong University
ControlNet++: All-in-one ControlNet for image generations and editing!
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
Visualizing the DROID dataset using Rerun
[CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
Simulation Software (ROS/MATLAB) for HECTOR Humanoid Robot Locomotion Control/Bipedal Locomotion Control/Force-and-moment-based MPC
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Unified framework for robot learning built on NVIDIA Isaac Sim
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Open-Sora: Democratizing Efficient Video Production for All
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
An Open-source Toolkit for LLM Development
Data pre-processing and training code on Open-X-Embodiment with pytorch