Pic@Point: Cross-Modal Learning by Local and Global Point-Picture Correspondence

Published: 05 Sept 2024, Last Modified: 16 Oct 2024ACML 2024 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: self-supervised pre-training, cross-modal, 2D-3D correspondence, point clouds
Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
TL;DR: This paper introduces Pic@Point, a self-supervised pre-training method for 3D point clouds that leverages structural point-picture correspondences.
Abstract: Self-supervised pre-training has achieved remarkable success in NLP and 2D vision. However, these advances have yet to translate to 3D data. Techniques like masked reconstruction face inherent challenges on unstructured point clouds, while many contrastive learning tasks lack in complexity and informative value. In this paper, we present Pic@Point, an effective contrastive learning method based on structural 2D-3D correspondences. We leverage image cues rich in semantic and contextual knowledge to provide a guiding signal for point cloud representations at various abstraction levels. Our lightweight approach outperforms state-of-the-art pre-training methods on several 3D benchmarks.
A Signed Permission To Publish Form In Pdf: pdf
Primary Area: Deep Learning (architectures, deep reinforcement learning, generative models, deep learning theory, etc.)
Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the guidelines in Call for Papers.
Student Author: Yes
Submission Number: 201
Loading