Towards Viewpoint-Robust End-to-End Autonomous Driving with 3D Foundation Model Priors

Published: 23 May 2026, Last Modified: 23 May 2026SAD 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Autonomous Driving, Planning, Viewpoint Robustness, 3D Foundation Model
TL;DR: We propose an augmentation-free approach leveraging 3D foundation model priors—depth-derived 3D positions and intermediate features—to improve end-to-end autonomous driving robustness to camera viewpoint changes.
Abstract: Robust trajectory planning under camera viewpoint changes is important for scalable end-to-end autonomous driving. However, existing models often depend heavily on the camera viewpoints seen during training. We investigate an augmentation-free approach that leverages geometric priors from a 3D foundation model. The method injects per-pixel 3D positions derived from depth estimates as positional embeddings and fuses intermediate geometric features through cross-attention. Experiments on the VR-Drive camera viewpoint perturbation benchmark show reduced performance degradation under most perturbation conditions, with clear improvements under pitch and height perturbations. Gains under longitudinal translation are smaller, suggesting that more viewpoint-agnostic integration is needed for robustness to camera viewpoint changes.
Submission Number: 14
Loading