CaNOCS: Category-Level 3D Correspondence from a single image

05 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark, semantic correspondences, morphable model, 6d Pose, deformation
TL;DR: We introduce HouseCorr3D, a dataset of 2D–3D correspondences for 50 object categories, and CaNOCS, a framework that surpasses NOCS and DinoV2 in category-level 3D correspondence, enabling finer object understanding for robotics and AR/VR.
Abstract: Recent progress in 6D object pose estimation has been driven by representations that map image pixels to normalized object coordinate spaces (NOCS). However, NOCS representations are fundamentally tailored to pose estimation, but are insufficient for detailed object understanding, since the same point in NOCS space may correspond to different semantic parts across object instances. We argue that the next frontier in object understanding is **category-level 3D correspondence**: predicting, from a single image, the canonical 3D location of each pixel in a way that is semantically aligned across all instances of a category. Such correspondences go beyond pose—they enable reasoning about function and interaction. To enable research in this direction, we introduce **HouseCorr3D**, the first dataset with dense semantic 2D–3D correspondences across 50 household object categories, including annotated CAD models, hundreds of real images per class, and amodal correspondences for occluded regions. We further propose **CaNOCS**, a framework for learning category-level **morphable shape priors** to enable 3D correspondence estimation that is semantically aligned across category instances. In extensive experiments, CaNOCS achieves substantially better category-level 3D correspondence than baselines based on NOCS or DINOv2. We believe that CaNOCS and HouseCorr3D establish a new paradigm to move beyond the 6D pose toward **fine-grained, correspondence-level object understanding** with broad applications in robotics and AR/VR.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 2432
Loading