Cost Volume Meets Prompt: Enhancing MVS with Prompts for Autonomous Driving

Qihao Sun; Jiarun Liu; Ziqian Ni; Jianyun Xu; Sheng Yang

Cost Volume Meets Prompt: Enhancing MVS with Prompts for Autonomous Driving

Qihao Sun, Jiarun Liu, Ziqian Ni, Jianyun Xu, Sheng Yang

19 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-view stereo, depth estimation

Abstract: Metric depth is foundational for perception, prediction, and planning in autonomous driving. Recent zero-shot metric depth foundation models still exhibit substantial distortions under large-scale ranges and diverse illumination. While multi-view stereo (MVS) offers geometric consistency, it fails in regions with weak parallax or textureless areas. On the other hand, directly using sparse LiDAR points as per-view prompts introduces noise and gaps due to occlusion, sparsity, and projection misalignment. To address these challenges, we introduce \textbf{Prompt-MVS}, a cross-view prompt-enhanced framework for metric depth estimation. Our key insight is to inject LiDAR-derived prompts into the cost volume construction process through a differentiable, matching-aware fusion module, enabling the model to leverage accurate metric cues while preserving dense geometric consistency provided by the MVS process. Furthermore, we propose depth-spatial alternating attention (DSAA), which combines spatial information with depth context, significantly improving multi-view geometric consistency. Experiments on KITTI, DDAD, and NYUv2 demonstrate the effectiveness of Prompt-MVS, which outperforms state-of-the-art methods by up to 34.6\% in scale consistency. Notably, our method remains effective even with missing or highly sparse prompts and produces stable metric depth under severe occlusion, weak texture, and long-range scenes, demonstrating strong robustness and generalization. Our code will be publicly available.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16275

Loading