Abstract: Recent advancements in self-supervised learning in the
point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks,
including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional
modalities. In order to address these issues, we introduce
Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we
introduce a sequencer that orders point cloud tokens to efficiently compute and utilize tokens’ proximity based on their
indices during target and context selection. The sequencer
also allows shared computations of the tokens’ proximity between context and target selection, further improving the efficiency. Experimentally, our method demonstrates
state-of-the-art performance while avoiding the reconstruction in the input space or additional modality. In particular, Point-JEPA attains classification accuracy of 93.7±0.2
% for linear SVM on ModelNet40 and 92.9±0.4% on the
ScanObjectNN OBJ-BG dataset, surpassing all other selfsupervised models. Moreover, Point-JEPA also establishes
new state-of-the-art performance levels across all four fewshot learning evaluation frameworks. Code is available at
https://github.com/Ayumu-J-S/Point-JEPA
Loading