GitHub - sunstarchan/DriveLM: DriveLM: Drive on Language

Drive on Language: Unlocking the future where autonomous driving meets the unlimited potential of language.

DriveLM.mp4

🔥 Highlights of the DriveLM Dataset

In the view of general Vision Language Models

🌳 Structured reasoning, multi-modal Graph-of-Thought testbench.

tree_827.mp4

In the view of full-stack autonomous driving

🛣 Completeness in functionality (covering Perception, Prediction and Planning QA pairs).

🔜 Reasoning for future events that have not yet happened.
- Many "What If"-style questions: imagine the future by language.

♻ Task-driven decomposition.
- One scene-level text goal into many frame-level trajectories & planning text descriptions.

News

[2023/08/25] DriveLM dataset demo v1.0 released.

(back to top)

Introduction

DriveLM is an autonomous driving (AD) dataset incorporating linguistic information. Through DriveLM, we want to connect large language models and autonomous driving systems, and eventually introduce the reasoning ability of Large Language Models in autonomous driving (AD) to make decisions and ensure explainable planning.

Specifically, in DriveLM, we facilitate Perception, Prediction, and Planning (P3) with human-written reasoning logic as a connection. To take it a step further, we leverate the ideo of Graph-of-Thought (GoT) to connect the QA pairs in a graph-style structure and use "What if"-style questions to reason about future events that have not happened.

Currently, a demo of the dataset has been released, and the full dataset and the model will be released in the future.

What is Graph-of-Thoughts in AD?

The most exciting aspect of the dataset is that the questions and answers (QA) in P3 are connected in a graph-style structure, with QA pairs as every node, and objects' relationships as the edges. Compared to language-only Tree-of-Thought or Graph-of-Thought, we go a step further towards multi-modality. The reason for doing this in the AD domain is that AD tasks are well-defined per stage, from raw sensor input to final control action.

📊 Comparison and stats: the first language-driving dataset facilitating P3 and logic

Language Dataset	Base Dataset	Language Form	Perspectives	Scale	Release?
BDD-X 2018	BDD	Description	Planning Description & Justification	8M frames, 20k text strings	✔️
HAD HRI Advice 2019	HDD	Advice	Goal-oriented & stimulus-driven advice	5,675 video clips, 45k text strings	✔️
Talk2Car 2019	nuScenes	Description	Goal Point Description	30k frames, 10k text strings	✔️
DRAMA 2022	-	Description	QA + Captions	18k frames, 100k text strings	✔️
nuScenes-QA 2023	nuScenes	QA	Perception Result	30k frames, 460k generated QA pairs	❌
DriveLM 2023	nuScenes	💥 QA + Scene Description	💥Perception, Prediction and Planning with Logic	30k frames, 360k annotated QA pairs	✔️

What is included in the DriveLM dataset?

We construct our dataset based on the prevailing nuScenes dataset. The most central element of DriveLM is frame-based P3 QA. Perception questions require the model to recognize objects in the scene. Prediction questions ask the model to predict the future status of important objects in the scene. Planning questions prompt the model to give reasonable planning actions and avoid dangerous ones.

How about the annotation process?

1️⃣ Keyframe selection. Given all frames in one clip, the annotator selects the keyframes that need annotation. The criterion is that those frames should involve changes in ego-vehicle movement status (lane changes, sudden stops, start after a stop, etc.).

2️⃣ Key objects selection. Given keyframes, the annotator needs to pick up key objects in the six surrounding images. The criterion is that those objects should be able to affect the action of the ego vehicle (traffic signals, pedestrians crossing the road, other vehicles that move in the direction of the ego vehicle, etc.).

3️⃣ Question and answer annotation. Given those key objects, we automatically generate questions regarding single or multiple objects about perception, prediction, and planning. More details can be found in our demo data.

(back to top)

Getting Started

Download Data
Prepare Dataset
Evaluation (TBA in the future)

(back to top)

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes) inherit their own distribution licenses. Please consider citing our project if it helps your research.

@misc{drivelm2023,
  title={DriveLM: Drive on Language},
  author={DriveLM Contributors},
  howpublished={\url{https://github.com/OpenDriveLab/DriveLM}},
  year={2023}
}

(back to top)

Other Projects

OpenDriveLab

DriveAGI | UniAD | OpenLane-V2 | Survey on E2EAD
Survey on BEV Perception | BEVFormer | OccNet

Autonomous Vision Group

tuPlan garage | CARLA garage | Survey on E2EAD
PlanT | KING

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
assets		assets
docs		docs
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
sample.html		sample.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 Highlights of the DriveLM Dataset

In the view of general Vision Language Models

In the view of full-stack autonomous driving

Table of Contents

News

Introduction

What is Graph-of-Thoughts in AD?

📊 Comparison and stats: the first language-driving dataset facilitating P3 and logic

What is included in the DriveLM dataset?

How about the annotation process?

Getting Started

License and Citation

Other Projects

About

Releases

Packages

Languages

License

sunstarchan/DriveLM

Folders and files

Latest commit

History

Repository files navigation

🔥 Highlights of the DriveLM Dataset

In the view of general Vision Language Models

In the view of full-stack autonomous driving

Table of Contents

News

Introduction

What is Graph-of-Thoughts in AD?

📊 Comparison and stats: the first language-driving dataset facilitating P3 and logic

What is included in the DriveLM dataset?

How about the annotation process?

Getting Started

License and Citation

Other Projects

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages