Position: Prospective of Autonomous Driving - Multimodal LLMs, World Models, Embodied Intelligence, AI Alignment, and Mamba

Published: 01 Jan 2025, Last Modified: 15 Oct 2025WACV (Workshops) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the emergence of Generative AI, multimodal AI systems that leverage foundation models are beginning to demonstrate enormous potential for perceiving the real world, collecting new data, making decisions, and using tools like humans. In recent years, the use of Large Language Models and World Models in autonomous driving has received widespread attention. However, despite their enormous potential, there is still a lack of comprehensive understanding regarding the key challenges, opportunities, and future applications of these new foundation models in driving systems. In this paper, we provide an outlook on this field, summarizing existing methods and exploring their limitations. In addition, we further discuss the applicability of emerging approaches, such as Reinforcement Learning from Human Feedback and Mamba for applications in autonomous driving. Finally, we highlight open questions and offer insights into promising directions for future research. This paper is part of a living document that will be updated based on the LLVM-AD workshop series to reflect the latest developments in the field.
Loading