Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Ziyu Guo; Renrui Zhang; Xiangyang Zhu; Yiwen Tang; Xianzheng Ma; Jiaming Han; Aojun Zhou; Kexin Chen; Peng Gao; Xianzhi Li; Hongsheng Li; Pheng-Ann Heng

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Aojun Zhou, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeX

Keywords: 3D point cloud learning, multi-modality learning, large language model

Abstract: With the growing diversity of large-scale data, learning from multi-modality has attained notable progress in language and 2D vision. However, in 3D domains, how to develop an all-purpose multi-modal framework is still under-explored. To this end, we introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, and audio. Guided by ImageBind, we construct a joint embedding space between 3D and multi-modalities, enabling many promising applications, e.g., any-to-3D generation, 3D embedding arithmetic, and 3D open-world understanding. On top of this joint embedding space, we further present Point-LLM, a 3D large language model (LLM) following 3D and multi-modal instructions. Without any 3D instruction data, our Point-LLM injects the semantics of Point-Bind into pre-trained LLMs, e.g., LLaMA, and exhibits superior 3D and multi-modal question-answering capacity. We have conducted extensive experiments to demonstrate the effectiveness and generalizability of our approach for aligning 3D and multi-modality.

Supplementary Material: pdf

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1622

Loading