PKRD-CoT: A Unified Chain-of-Thought Prompting for Multi-modal Large Language Models in Autonomous Driving
Abstract: There are growing interest in leveraging the capabilities of robust Multi-Modal Large Language Models (MLLMs) directly within autonomous driving contexts. However, the high costs and complexity of designing and training end-to-end autonomous driving models make them difficult for many enterprises and research entities. To address this, our study explores a seamless integration of MLLMs into autonomous driving systems, by proposing a Zero-Shot Chain-of-Thought (Zero-shot-CoT) prompt design named PKRD-CoT. PKRD-CoT is constructed based on the four fundamental capabilities of autonomous driving-perception, knowledge, reasoning, and decision-making-making it particularly suitable for understanding and responding to dynamic driving environments by mimicking human thought processes step by step to enhance decision-making in real-time scenarios. Our design enables MLLMs to tackle problems without prior experience, thus enhancing its utility within unstructured autonomous driving environments. In experiments, we demonstrate the exceptional performance of GPT 4.0 with PKRD-CoT across autonomous driving tasks, highlighting its effectiveness for application in autonomous driving scenarios. Additionally, our benchmark analysis reveals promising viability of PKRD-CoT for other MLLMs such as Claude, LLava1.6, and Qwen-VL-Plus. Overall, this study contributes a novel and unified prompt designing framework for GPT 4.0 and other MLLMs in autonomous driving, while also evaluating the efficacy of these widely recognized MLLMs in the autonomous driving domain through rigorous comparisons.
External IDs:doi:10.1007/978-981-96-7008-6_5
Loading