Cross-View Completion Models are Zero-shot Correspondence Estimators

Published: 2025, Last Modified: 07 Jan 2026CVPR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this work, we analyze new aspects of cross-view completion, mainly through the analogy of cross-view completion and traditional self-supervised correspondence learning algorithms. Based on our analysis, we reveal that the cross-attention map of Croco-v2, best reflects this correspondence information compared to other correlations from the encoder or decoder features. We further verify the effectiveness of the cross-attention map by evaluating on both zero-shot and supervised dense geometric correspondence and multi-frame depth estimation.
Loading