VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks (Early Preview-Version!)
🚨 NOTICE: 🎁 The early preview version is released on my birthday (12.25) as a gift for myself🎄! Most codes are still under management or even reconstruction for a more robust and user-friendly version.(Sorry, I’ve been so busy these days). The Complete Version will be open-sourced around the Chinese Lunar New Year🧧!
I don’t like the phrase "code coming soon"; it often feels like I’ll never actually see the code on GitHub, which can be quite frustrating. So this early version is my promise.
🎓 Paper | 🌐 Project Website | 🤗 Hugging Face
- 2025/2/26 Releasing referenced evaluation pipeline.
- 2025/2/14 Releasing the scripts for trajectory generation.
- 2024/12/25 The preview verison of VLABench has been released! The preview version showcases most of the designed tasks and structure, but the functionalities are still being managed and tested.
- Prepare conda environment
conda create -n vlabench python=3.10
conda activate vlabench
git clone https://github.com/OpenMOSS/VLABench.git
cd VLABench
pip install -r requirements.txt
pip install -e .
- Download the assets
python script/download_assetes.py
- (Option) Initialize submodules
git submodule update --init --recursive
This will update other policies repos such openpi.
The script will automatically download the necessary assets and unzip them into the correct directory.
Some experiences to create octo evaluation env:
conda env remove -n octo
conda create -n octo python=3.10
conda activate octo
pip install -e .
pip install "jax[cuda12_pip]==0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html flax==0.7.5
pip install tensorflow==2.15.0 pip install dlimp@git+https://github.com/kvablack/dlimp@5edaa4691567873d495633f2708982b42edf1972
pip install distrax==0.1.5
pip install tensorflow_probability==0.23.0
pip install scipy==1.12.0
pip install einops==0.6.1
pip install transformers==4.34.1
pip install ml_collections==0.1.0
pip install wandb==0.12.14
pip install matplotlib
pip install gym==0.26
pip install plotly==5.16.1
pip install orbax-checkpoint==0.4.0
Note: Line 5 "cuda12_pip" may be replaced by other proper version according to your machine. Refer to jax installation.
Make sure jax version=0.4.20 and flax version=0.7.5
pip show jax flax jaxlib
Run this to verify installation successful
python -c "from octo.model.octo_model import OctoModel; model = OctoModel.load_pretrained('hf://rail-berkeley/octo-base-1.5'); print('Model loaded successfully')"
We provide a brief tutorial in tutorials/2.auto_trajectory_generate.ipynb
and the whole codes are in scripts/trajectory_generation.py
. Trajectory generation can be sped up several times by using multiple processes. A naive way to use it is:
sh data_generation.sh
Currently, the version does not support multi-processing environment within the code. We will optimize the collection efficiency as much as possible in future updates. After running the script, each trajectory will be stored as a hdf5 file in the directory you specify.
Due to some frameworks such as Octo and Openvla using data in the RLDS format for training, we refer to the process from rlds_dataset_builder to provide an example of converting the aforementioned HDF5 dataset into RLDS format data. First, run
python scripts/convert_to_rlds.py --task [list] --save_dir /your/path/to/dataset
This will create a python file including the task rlds-builder in the directory. Then
cd /your/path/to/dataset/task
tfds build
This process consumes a long time with only single process, and we are testing multithreading mthod yet. The codes of original repo seem to have some bugs.
Following the Libero dataset process way of openpi, we offer a simple way to convert hdf5 data files into lerobot format. Run the script by
python scripts/convert_to_lerobot.py --dataset-name [your-dataset-name] --dataset-path /your/path/to/dataset --max-files 100
The processed Lerobot dataset will be stored defaultly in your HF_HOME/lerobot/dataset-name
.
- Organize the functional code sections.
- Reconstruct the efficient, user-friendly, and comprehensive evaluation framework.
- Manage the automatic data workflow for existing tasks.
- Improve the DSL of skill libarary.
- Release the trejectory and evaluation scripts.
- Test the interface of humanoid and dual-arm manipulation.
- Release the left few tasks not released in preview version.
- Integrate the commonly used VLA models for facilitate replication. (Continously update)
- Leaderboard of VLAs and VLMs in the standard evaluation
- Release standard evaluation datasets/episodes, in different dimension and difficulty level.
- Release standard finetune dataset.
VLABench adopts a flexible modular framework for task construction, offering high adaptability. You can follow the process outlined in tutorial 6.
Due to different project packaging methods for each model, we are initially providing the evaluation method for OpenVLA. Evaluation scripts for other models are currently being integrated by git submodules.
- Evaluate OpenVLA
Before evaluate your finetuned OpenVLA, please compute the norm_stat on your dataset and place it to
VLABench/configs/model/openvla_config.json
Run the evaluation scripts by
python scirpts/evaluate_policy.py --n-sample 20 --model openvla --model_ckpt xx --loar_ckpt xx
- Evaluate Openpi
Please use
git submodule update --init --recursive
to ensure that you have correctly installed the repositories for the other models.
For openpi, you should create a virtual env with uv
and run the server policy. Then, you can evaluate the finetuned openpi on VLABench. Please refer here for example.
@misc{zhang2024vlabench,
title={VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks},
author={Shiduo Zhang and Zhe Xu and Peiju Liu and Xiaopeng Yu and Yuan Li and Qinghui Gao and Zhaoye Fei and Zhangyue Yin and Zuxuan Wu and Yu-Gang Jiang and Xipeng Qiu},
year={2024},
eprint={2412.18194},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2412.18194},
}