VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

ICLR 2025 Conference Submission794 Authors

14 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, 3D completion, 3D generation
Abstract: 3D completion represents a critical task within the vision industries. Traditional diffusion-based methodologies have achieved commendable performance; however, they are hindered by several issues. Firstly, these methods primarily depend on models such as CLIP or BERT to encode textual information, thereby making them incapable of supporting detailed and complex instructions. Moreover, their model sizes usually increase rapidly when the scene is larger or the voxel resolution is higher, making it impossible to scale up. Witnessing the significant advancements in multi-modal understanding capabilities facilitated by recent developments in large language models (LLMs), we introduce Volume Patch LLM (VP-LLM), designed to execute *user-friendly* conditional 3D completion and denoising using a token-based single-forward pass approach. To integrate a 3D model into the textual domain of the LLM, the incomplete 3D model is initially divided into smaller patches—a process we refer to as "patchification"—in a way that each patch can be independently encoded, analogous to the tokenization configuration utilized by LLMs. These encoded patches are subsequently concatenated with the encoded text prompt sequence and inputted into an LLM, which is fine-tuned to capture the relationships between these patch tokens while embedding semantic meanings into the 3D object. Our findings indicate a robust ability of LLMs to interpret complex text instructions and comprehend 3D objects, surpassing the quality of results produced by state-of-the-art diffusion-based 3D completion models, especially when complex text prompts are given.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 794
Loading