Abstract: As a new pre-training paradigm, masked autoencoding has significantly advanced self-supervised learning in NLP and computer vision. However, it remains under exploration whether masked autoencoding can be generalized to feature learning in point clouds. In this paper, we present a novel self-supervised learning framework based on Joint Masked Autoencoding with global reconstruction (JMA). The key idea is to randomly mask some patches in the point clouds and use the visible patches to reconstruct the masked ones. In contrast with previous methods based on masked autoencoding, our JMA splits the point clouds into multiple partitions and uses every single partition to predict all other partitions, which involves simultaneous learning of multiple masked autoencoding tasks. Moreover, each partition is supervised to learn the global shape of the point clouds, which enables the model to capture global pictures of the original point clouds. Extensive results demonstrate the advantages of the proposed method on various downstream tasks. Specifically, on the widely used ModelNet40 and the more challenging ScanObjectNN datasets, the pre-trained model achieves consistently improved performances.
Loading