World-Model based Hierarchical Planning with Semantic Communications for Autonomous Driving

Dechen Gao; Hang Wang; Shuangyu Cai; Hanchu Zhou; Nejib Ammar; Shatadal Mishra; Iman Soltani; Junshan Zhang

World-Model based Hierarchical Planning with Semantic Communications for Autonomous Driving

Dechen Gao, Hang Wang, Shuangyu Cai, Hanchu Zhou, Nejib Ammar, Shatadal Mishra, Iman Soltani, Junshan Zhang

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: World Model, Hierarchical Planning, Reinforcement Learning, Autonomous Driving, Communications

TL;DR: We propose a hierarchical world model framework that mimics human driving - planning and communicating higher-level driving actions. We devise AdaSMO to address multi-objective optimization of hierarchical training.

Abstract: World-model (WM) is a highly promising approach for training AI agents. However, in complex learning systems such as autonomous driving, AI agents interact with others in a dynamic environment and face significant challenges such as partial observability and non-stationarity. Inspired by how humans naturally solve complex tasks hierarchically and how drivers share their intentions by using turn signals, we introduce HANSOME, a WM-based hierarchical planning with semantic communications framework. In HANSOME, semantic information, particularly text and compressed visual data, is generated and shared to improve two-level planning. HANSOME incorporates two important designs: 1) A hierarchical planning strategy, where the higher-level policy generates intentions with text semantics, and a semantic alignment technique ensures the lower-level policy determines specific controls to achieve these intentions. 2) A cross-modal encoder-decoder to fuse and utilize the shared semantic information to enhance planning through multi-modal understanding. A key advantage of HANSOME is that the generated intentions not only enhance the lower-level policy but also can be shared and understood by humans or other AVs to improve their planning. Furthermore, we devise AdaSMO, an entropy-controlled adaptive scalarization method, to tackle the multi-objective optimization problem in hierarchical policy learning. Extensive experiments show that HANSOME outperforms state-of-the-art WM-based methods in challenging driving tasks, enhancing overall traffic safety and efficiency.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13103

Loading