Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration

Yifan Xie

Published: 31 Dec 2023, Last Modified: 15 Apr 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: The majority of point cloud registration algorithms currently rely on extracting features from points. However, these methods are limited by their dependence on information obtained from a single modality of points, which can result in deficiencies such as inadequate perception of global features and a lack of texture information. Actually, humans can employ visual information learned from 2D images to comprehend the 3D world. Based on this fact, we present a novel Cross-Modal Information-Guided Network (CMIGNet), which obtains global shape perception through cross-modal information to achieve precise and robust point cloud registration. Specifically, our work employs two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning. The former focuses on features in overlapping regions, while the latter emphasizes the correspondences between 2D and 3D features. Furthermore, an attention mechanism is employed to facilitate information interaction and generate hybrid features. These hybrid features are then fed into a mask prediction module to identify keypoints in the point clouds. Finally, the hybrid features and spatial coordinates are utilized to guide the search for point-to-point correspondences independently, and a weighted SVD algorithm is utilized to obtain the final rigid transformation. Extensive experiments on several benchmark datasets demonstrate that our network achieves superior registration performance.