PointMLLM: Aligning multi-modality with LLM for point cloud understanding, generation and editing

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: UNIPOINT-LLM: UNIFYING POINT CLOUD UNDER- STANDING AND MORE CONTROLLABLE 3D GENERA- TION
Abstract: We introduce UniPoint-LLM, which integrates point cloud understanding ability into Image Multimodal Large Language Model(MLLM) and enable more flexible and controllable natural language-driven 3D generation, realized the unified pro- cess of point cloud understanding and generation. Unlike traditional text-to-3D methods with limited prompt inputs or constrained parameters, UniPoint-LLM al- lows users to input natural language description to specify their requirements. By aligning image and point cloud modalities through joint training and weights shar- ing, UniPoint-LLM also achieves two modalities’ understanding. Experiments demonstrate that UniPoint-LLM offers users greater flexibility and control in gen- erating desired 3D objects and the effectiveness of our Multimodal Universal To- ken Space(MUTS) in understanding both images and point clouds. These exper- iments validates its potential value in practical applications of 3D generation and interactive desig
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3392
Loading