Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

Chenfeng Xu; Shijia Yang; Bohan Zhai; Bichen Wu; Xiangyu Yue; Wei Zhan; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka

Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

Chenfeng Xu, Shijia Yang, Bohan Zhai, Bichen Wu, Xiangyu Yue, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: Computer vision, Point-cloud, Cross-modality.

Abstract: 3D point-clouds and 2D images are different visual representations of the physical world. While human vision can understand both representations, computer vision models designed for 2D image and 3D point-cloud understanding are quite different. Our paper explores the potential for transferring between these two representations by empirically investigating the feasibility of the transfer, the benefits of the transfer, and shedding light on why the transfer works. We discovered that we can indeed use the same architecture and pretrained weights of a neural net model to understand both images and point-clouds. Specifically, we can transfer the pretrained image model to a point-cloud model by \textit{inflating} 2D convolutional filters to 3D and then \textbf{f}inetuning the \textbf{i}mage-\textbf{p}retrained models (FIP). We discover that, surprisingly, models with minimal finetuning efforts --- only on input, output, and optionally batch normalization layers, can achieve competitive performance on 3D point-cloud classification, beating a wide range of point-cloud models that adopt task-specific architectures and use a variety of tricks. When finetuning the whole model, the performance further improves significantly. Meanwhile, we also find that FIP improves data efficiency, achieving up to 10.0 points top-1 accuracy gain on few-shot classification. It also speeds up the training of point-cloud models by up to 11.1x to reach a target accuracy.

One-sentence Summary: With minimal fine-tuning efforts, pretrained-image models can be directly used for point-cloud understanding.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 7 code implementations](https://www.catalyzex.com/paper/image2point-3d-point-cloud-understanding-with/code)

14 Replies

Loading