Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving

Takuya Nanri; Siyuan Wang; Akio Shigekane; Jo Nishiyama; CHU Tao; Kohei Yokosawa

Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving

Takuya Nanri, Siyuan Wang, Akio Shigekane, Jo Nishiyama, CHU Tao, Kohei Yokosawa

29 Mar 2024 (modified: 27 Apr 2024)Submitted to VLADR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision and Language Model, Autonomous Driving

Abstract: In automatic driving, it is important to convey the intention of the driving behavior of the ego car to the driver or passengers to achieve reliable and trustworthy driving. Most conventional methods that explain the intention and justification of the ego car use only image input without trajectory information, which is insufficient for explaining the intention of the ego car. In this study, we propose a multi-modal large language model based explanation method for trajectory planning that uses not only the frontal image but also the trajectory planning information of the ego car as input. Based on a dedicated dataset in which both the frontal video and trajectory planning information are simultaneously acquired, we confirm that this method can give effective results compared with the case without trajectory information.

Submission Number: 13

Loading