GCML: Grounding Complex Motions using Large Language Model in 3D Scenes

14 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: human-scene interaction, human motion generation, large language model, 3d visual grounding
Abstract: To solve the problem of generating complex motions, we introduce GCML (Grounding Complex Motions using Large Language Model). This method supports complex texts and scenes as inputs, such as mopping the floor in a cluttered room. Such everyday actions are challenging for current motion generation models for two main reasons. First, such complex actions are rarely found in existing HSI datasets, which places high demands on the generalization capabilities of current data-driven models. Second, these actions are composed of multiple stages, with considerable variation between them, making it difficult for models to understand and generate the appropriate motions. Current methods in the HSI field can control the generation of simple actions under multiple constraints, such as walking joyfully toward a door, but they cannot handle the complexity of tasks like the one described above. By incorporating a Large Language Model and a 3D Visual Grounding Model into the HSI domain, our approach can decompose complex user prompts into a sequence of simpler subtasks and identify interaction targets and obstacles within the scene. Based on these subtask descriptions and spatial control information, the Motion Generation Model generates a sequence of full-body motions, which are then combined into a long motion sequence that aligns with both the user's input and the scene semantics. Experimental results demonstrate that our method achieves competitive performance for simple action generation on the HUMANISE dataset and the generalization evaluation set. For complex motion generation, we created a new evaluation set by automatically generating possible behaviors of virtual humans in common indoor scenes, where our method significantly outperforms existing approaches. Project Page: https://anonymous.4open.science/w/GCML-4562/
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 753
Loading