Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu; Haotian Tang; Yujun Lin; Song Han

Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu, Haotian Tang, Yujun Lin, Song Han

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: We present Point-Voxel CNN (PVCNN) for efﬁcient, fast 3D deep learning. Previous work processes 3D data using either voxel-based or point-based NN models. However, both approaches are computationally inefﬁcient. The computation cost and memory footprints of voxel-based networks grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution. As for point-based networks, up to 80% of the time is wasted on structuring the irregular data which have poor memory locality, not on the actual feature extraction. In this paper, we propose PVCNN that represents the 3D data in points to reduce the memory consumption, while performing the convolutions in voxels to largely reduce the irregular data access and improve the locality. Our PVCNN model is both memory and computation efﬁcient. Evaluated on semantic and part segmentation datasets, it achieves much higher accuracy than the voxel-based baseline with 7× GPU memory reduction; it also outperforms the state-of-the-art point-based and voxel-based models with 6× measured speedup on average. Remarkably, a narrow version of our PVCNN achieves 1.9× speedup over PointNet (an extremely efﬁcient model) with much higher accuracy. We validate the general effectiveness of our PVCNN on 3D object detection: by replacing the primitives in Frustrum PointNet with our PVConv, it outperforms Frustrum PointNet++ by 2.4% mAP on average with 1.5× speedup and 2× GPU memory reduction.

CMT Num: 539

Code Link: https://pvcnn.mit.edu

1 Reply

Loading