AD-H: Autonomous Driving with Hierarchical Agents

Zaibin Zhang; Shiyu Tang; Talas Fu; Yuanhang Zhang; Yifan Wang; Yang Liu; Dong Wang; Jing Shao; Lijun Wang; Huchuan Lu

AD-H: Autonomous Driving with Hierarchical Agents

Zaibin Zhang, Shiyu Tang, Talas Fu, Yuanhang Zhang, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

26 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AutoDriving; Agent, Planning

Abstract: Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly use MLLMs to translate high-level instructions into low-level vehicle control signals. This approach deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent capabilities. As a result, the generalizability of these methods is limited by the autonomous driving datasets used during fine-tuning. To tackle this challenge, we propose AD-H, a hierarchical framework that enables two agents (the MLLM planner and the controller) to collaborate. The MLLM planner perceives environmental information and high-level instructions to generate mid-level, fine-grained driving commands, which the controller then executes as actions. This compositional paradigm liberates the MLLM from low-level control signal decoding, thus fully leveraging its high-level perception, reasoning, and planning capabilities. Furthermore, the fine-grained commands provided by the MLLM planner enable the controller to perform actions more effectively. To train AD-H, we build a new autonomous driving dataset with hierarchical action annotations encompassing multiple levels of instructions and driving commands. Comprehensive closed-loop evaluations demonstrate several key advantages of our proposed AD-H system. First, AD-H can notably outperform state-of-the-art methods in achieving exceptional driving performance, even exhibiting self-correction capabilities during vehicle operation, a scenario not encountered in the training dataset. Second, AD-H demonstrates superior generalization under long-horizon instructions and novel environmental conditions, significantly surpassing current state-of-the-art methods.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5789

Loading