Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun-Hsuan Wang; Alaa Maalouf; Wei Xiao; Yutong Ban; Alexander Amini; Guy Rosman; Sertac Karaman; Daniela Rus

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun-Hsuan Wang, Alaa Maalouf, Wei Xiao, Yutong Ban, Alexander Amini, Guy Rosman, Sertac Karaman, Daniela Rus

Published: 05 Nov 2023, Last Modified: 04 Nov 2023OOD Workshop @ CoRL 2023EveryoneRevisionsBibTeX

Keywords: End-to-end Driving, Generalization, Foundation Models

TL;DR: We use multi-modal foundation models to improve generalization of end-to-end autonomous driving.

Abstract: As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations. Check our website https://drive-anywhere.github.io for more videos and demos.

Submission Number: 16

Loading