Abstract: Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable plug-and-play framework for MAE pre-training to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baseline methods with two additional mask choices (i.e., medium mask and low mask) as our core insight is that the recovery process of an object can manifest in diverse ways. Previous high-masking schemes focus on capturing the global representation information but lack fine-grained recovery capabilities, so that the generated pre-training weights tend to play a limited role in the fine-tuning process. With the support of the proposed TPM, current methods can exhibit more flexible and accurate completion capabilities, enabling the potential autoencoder in the pre-training stage to consider multiple representations of a single 3D point cloud object. In addition, during the fine-tuning stage, an SVM-guided weight selection module is proposed to fill the encoder parameters for downstream networks with the optimal weight, maximizing linear accuracy and facilitating the acquisition of intricate representations for new objects. Extensive experimental results and theoretical analysis show that five baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks. Our code and models are available at https://github.com/liujia99/TPM.
External IDs:dblp:journals/tcsv/LiuKWGLMMQ25
Loading