PointOfView: A Multi-modal Network for Few-shot 3D Point Cloud Classification Fusing Point and Multi-view Image Features

Huantao Ren; Jiyang Wang; Minmin Yang; Senem Velipasalar

PointOfView: A Multi-modal Network for Few-shot 3D Point Cloud Classification Fusing Point and Multi-view Image Features

Huantao Ren, Jiyang Wang, Minmin Yang, Senem Velipasalar

Published: 01 Jan 2024, Last Modified: 10 Jan 2025CVPR Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most existing 3D point cloud analysis approaches employ traditional supervised methods, which require large amounts of labeled data, and data annotation is labor-intensive, and costly. On the other hand, although many existing works use either raw 3D point clouds or multiple 2D depth images, their joint use is relatively under-explored. To address these issues, we propose PointOfView, a novel, multi-modal few-shot 3D point cloud classification model, to classify never-before-seen classes with only a few annotated samples. A 2D multi-view learning branch is proposed for processing multiple projection images, and it contains two sub-branches to extract information at individual image level as well as among all six depth images. In addition, we propose a multi-scale 2D pooling layer, which employs various 2D max-pooling and 2D average pooling operations, with different pooling sizes. This allows fusing features at different scales. The second main branch processes raw 3D point clouds by first sorting them, and then using DGCNN to extract features. We perform within-dataset and cross-domain experiments on ModelNel40, ModelNet40-C and ScanobjectNN datasets, and compare with six state-of-the-art baselines. The results show that our approach outperforms all baselines in all experimental settings and achieve the state-of-the-art performance.

Loading