Keywords: Image-to-Point cloud registration, Multi-modal learning
Abstract: Aligning 2D images with 3D point clouds remains a challenging problem due to intrinsic modality differences. In this paper, we introduce Dual-view Matching Aggregation (DuMA), a novel image-to-point cloud registration framework designed to address this challenge. Our approach incorporates a dual-view matching strategy that harmonizes 2D-3D and 3D-3D correspondences, leveraging complementary insights from both modalities. We design a score aggregation module that fuses dual correspondence scores through a detailed analysis of neighborhood relationships, thereby inducing a robust geometric verification effect and enforcing spatial consistency. To reduce the burden associated with high-dimensional score aggregation, we additionally propose an innovative Anchor-Pivot 5D encoder that decomposes and processes multi-modality scores. Extensive experiments on challenging indoor and outdoor datasets demonstrate that our method significantly mitigates ambiguity while delivering robustness and effectiveness in complex scenes. Code and models will be made available: TBD.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 8702
Loading