Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Published: 05 Nov 2023, Last Modified: 04 Nov 2023OOD Workshop @ CoRL 2023EveryoneRevisionsBibTeX
Keywords: End-to-end Driving, Generalization, Foundation Models
TL;DR: We use multi-modal foundation models to improve generalization of end-to-end autonomous driving.
Abstract: As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations. Check our website https://drive-anywhere.github.io for more videos and demos.
Submission Number: 16
Loading