This repository contains the implementation of the paper:
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang*, Chenjian Feng*, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan liang†, Lin Ma†
*Equal Contribution †Corresponding Authors
🔥 We propose a novel all-in-one large multimodal model, DriveMM, robustly equipped with the general capabilities to execute a wide range of AD tasks and the generalization ability to effectively transfer to new datasets.
🔥 We introduce comprehensive benchmarks for evaluating autonomous driving LMMs, which include six public datasets, four input types, and thirteen challenging tasks. To the best of our knowledge, this is the first to use multiple benchmarks to evaluate autonomous driving LLMs. 🔥 We present a curriculum principle for pre-training and fine-tuning on both diverse multimodal data and AD data. DriveMM demonstrates state-of-the-art performances and consistently outperforms models trained on the individual dataset across all evaluated benchmarks.git clone https://github.com/zhijian11/DriveMM
cd DriveMM
conda create -n drivemm python=3.10 -y
conda activate drivemm
pip install --upgrade pip # Enable PEP 660 support.
pip install -e ".[train]"
- Download the checkpoint and put them on ckpt/ floder.
cd scripts/inference_demo
python demo_image.py # for image input
python demo_video.py # for video input
- DriveMM models
- DriveMM inference code
- DriveMM evaluation code
- DriveMM training data
- DriveMM training code
This project has referenced some excellent open-sourced repos(LLaVa-NeXT). Thanks for their wonderful works and contributions to the community.
If you find DriveMM is helpful for your research or applications, please consider giving us a star 🌟 and citing it by the following BibTex entry.
@article{huang2024drivemm,
title={DriveMM: All-in-One Large Multimodal Model for Autonomous Driving},
author={Huang, Zhijian and Fen, Chengjian and Yan, Feng and Xiao, Baihui and Jie, Zequn and Zhong, Yujie and Liang, Xiaodan and Ma, Lin},
journal={arXiv preprint arXiv:2412.07689},
year={2024}
}