C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction

Kuan Wei Huang; Brandon Li; Bharath Hariharan; Noah Snavely

C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction

Kuan Wei Huang, Brandon Li, Bharath Hariharan, Noah Snavely

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: computer vision, correspondence, pointmap prediction

TL;DR: We present a dataset of floor plan-photo pairs from the Internet with pixel correspondences and camera poses, adapt DUSt3R to improve correspondence prediction, and identify systematic errors for future work.

Abstract: Geometric models like DUSt3R have shown great advances in understanding the geometry of a scene from pairs of photos. However, they fail when the inputs are from vastly different viewpoints (e.g., aerial vs.\ ground) or modalities (e.g., photos vs.\ abstract drawings) compared to what was observed during training. This paper addresses a challenging version of this problem: predicting correspondences between ground-level photos and floor plans. Current datasets for joint photo--floor plan reasoning are limited, either lacking in varying modalities (VIGOR) or lacking in correspondences (WAFFLE). To address these limitations, we introduce a new dataset, C3, created by first reconstructing a number of scenes in 3D from Internet photo collections via structure-from-motion, then manually registering the reconstructions to floor plans gathered from the Internet, from which we can derive correspondence between images and floor plans. C3 contains 90K paired floor plans and photos across 597 scenes with 153M pixel-level correspondences and 85K camera poses. We find that state-of-the-art correspondence models struggle on this task. By training on our new data, we can improve on the best performing method by 34\% in RMSE. We also identify open challenges in cross-modal geometric reasoning that our dataset aims to help address. Our project website is available at: \url{https://c3po-correspondence.github.io/}.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/kwhuang/C3

Code URL: https://github.com/c3po-correspondence/C3Po

Supplementary Material: zip

Primary Area: Datasets & Benchmarks for applications in computer vision

Submission Number: 2406

Loading