Skip to content

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

License

Notifications You must be signed in to change notification settings

JohnsonJiang1996/VLM-RL-v2

 
 

Repository files navigation

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving


VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Zilin Huang1,†, Zihao Sheng1,†, Yansong Qu2,†, Junwei You1, Sikai Chen1,✉

1University of Wisconsin-Madison, 2Purdue University

Equally Contributing First Authors, Corresponding Author

💡 Highlights

🔥 To the best of our knowledge, VLM-RL is the first work in the autonomous driving field to unify VLMs with RL for end-to-end driving policy learning in the CARLA simulator.

🏁 VLM-RL outperforms state-of-the-art baselines, achieving a 10.5% reduction in collision rate, a 104.6% increase in route completion rate, and robust generalization to unseen driving scenarios.

Route 1 Route 2 Route 3 Route 4 Route 5
Route 1 Route 2 Route 3 Route 4 Route 5
Route 6 Route 7 Route 8 Route 9 Route 10
Route 6 Route 7 Route 8 Route 9 Overtake

📋 Table of Contents

  1. Highlights
  2. Getting Started
  3. Training
  4. Evaluation
  5. Contributors
  6. Citation
  7. Other Resources

🛠️ Getting Started

  1. Download and install CARLA 0.9.13 from the official release page.
  2. Create a conda env and install the requirements:
# Clone the repo
git clone https://github.com/zihaosheng/VLM-RL.git
cd VLM-RL

# Create a conda env
conda create -y -n vlm-rl python=3.8
conda activate vlm-rl

# Install PyTorch
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

# Install the requirements
pip install -r requirements.txt
  1. Start a Carla server with the following command. You can ignore this if start_carla=True
./CARLA_0.9.13/CarlaUE4.sh -quality_level=Low -benchmark -fps=15 -RenderOffScreen -prefernvidia -carla-world-port=2000

If start_carla=True, revise the CARLA_ROOT in carla_env/envs/carla_route_env.py to the path of your CARLA installation.

(back to top)

🚋 Training

Training VLM-RL

To reproduce the results in the paper, we provide the following training scripts:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

Note: On the first run, the script will automatically download the required OpenCLIP pre-trained model, which may take a few minutes. Please wait for the download to complete before the training begins.

To accelerate the training process, you can set up multiple CARLA servers running in parallel.

For example, to train the VLM-RL model with 3 CARLA servers on different GPUs, run the following commands in three separate terminals:

Terminal 1:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

Terminal 2:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2005 --device=cuda:1

Terminal 3:

python train.py --config=vlm_rl --start_carla --no_render --total_timesteps=1_000_000 --port=2010 --device=cuda:2

To train the VLM-RL model with PPO, run:

python train.py --config=vlm_rl_ppo --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

Training Baselines

To train baseline models, simply change the --config argument to the desired model. For example, to train the TIRL-SAC model, run:

python train.py --config=tirl_sac --start_carla --no_render --total_timesteps=1_000_000 --port=2000 --device=cuda:0

More baseline models can be found in the CONFIGS dictionary of config.py.

(back to top)

📊 Evaluation

To evaluate trained model checkpoints, run:

python run_eval.py

Note: that this command will first KILL all the existing CARLA servers and then start a new one. Try to avoid running this command while training is in progress.

(back to top)

👥 Contributors

Special thanks to the following contributors who have helped with this project:

zihaosheng
Zihao Sheng
zilinhuang
Zilin Huang
yansongqu
Yansong Qu
junweiyou
Junwei You

(back to top)

🎯 Citation

If you find VLM-RL useful for your research, please consider giving us a star 🌟 and citing our paper:

@article{huang2024vlmrl,
  title={VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving},
  author={Huang, Zilin and Sheng, Zihao and Qu, Yansong and You, Junwei and Chen, Sikai},
  journal={arXiv preprint arXiv:2412.15544},
  year={2024}
}

(back to top)

📚 Other Resources

Our team is actively working on research projects in the field of AI and autonomous driving. Here are a few of them you might find interesting:

(back to top)

About

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%