MV3D-MAE: 2D Pre-trained MAEs are Effective 3D Representation Learners

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Point Cloud, MAE, Multi-view depth image,2D-3D
Abstract: Deep learning's success relies heavily on the availability of extensive labelled datasets. Compared to 2D data, acquiring 3D data is substantially more expensive and time-consuming. Current multi-modal self-supervised approaches often involve converting 3D data into 2D data for parallel multi-modal training, thereby ignoring the prior knowledge contained within extensively trained 2D models. Therefore, it is important to find ways to utilize 2D feature priors to facilitate the learning process of 3D models. In this paper, we propose MV3D-MAE, a masked autoencoder framework that utilizes a pre-trained 2D MAE model to enhance 3D representation learning. Initially, we convert single 3D point clouds into multi-view depth images. Building on a pre-trained 2D MAE model, we adapt the model for multi-view depth image reconstruction by integrating group attention and incorporating additional attention layers. Then we propose a differentiable 3D reconstruction method named Mv-Swin, which maps the reconstructed results back to 3D objects without the use of camera poses, thereby learning 3D spatial representations. Thus, MV3D-MAE, through the bidirectional transformation between 2D and 3D data, mitigates the differences between modalities and enhances the network's representational performance by leveraging the prior knowledge in the pre-trained 2D MAE. Our model significantly improves performance in few-shot classification and achieves SOTA results in linear Support Vector Machine classification. It also demonstrated competitive performance in other downstream tasks of classification and segmentation in synthetic and real-world datasets.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9672
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview