Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud

Ayumu Saito, Prachi Kudeshia, Jiju Poovvancheri

Published: 28 Oct 2024, Last Modified: 05 Nov 2024WACV 2025EveryoneCC BY 4.0

Abstract: Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud tokens to efficiently compute and utilize tokens’ proximity based on their indices during target and context selection. The sequencer also allows shared computations of the tokens’ proximity between context and target selection, further improving the efficiency. Experimentally, our method demonstrates state-of-the-art performance while avoiding the reconstruction in the input space or additional modality. In particular, Point-JEPA attains classification accuracy of 93.7±0.2 % for linear SVM on ModelNet40 and 92.9±0.4% on the ScanObjectNN OBJ-BG dataset, surpassing all other selfsupervised models. Moreover, Point-JEPA also establishes new state-of-the-art performance levels across all four fewshot learning evaluation frameworks. Code is available at https://github.com/Ayumu-J-S/Point-JEPA