Abstract: Explainable decision-making is critical for building trust in autonomous vehicles. We investigate the use of a pre-trained large language model (LLM) to derive comprehensible driving decisions from multi-modal time-series data captured by a monocular camera on an autonomous vehicle. Leveraging a graph-of-thought structure, the LLM learns policies that perform robustly while generating natural language rationales. We generate a novel multi-modal dataset with sequential images, scene labels, and driving actions. Results demonstrate our method produces human- understandable explanations for its driving choices, providing transparency. Our work indicates incorporating language-based reasoning enables accountable and transparent decision-making for self-driving cars, making LLM a potential solution for autonomous driving.
Loading