Keywords: Human Motion Generation; VLM; 3D Generative Models
Abstract: Generating realistic 3D human motion is crucial in the frontier applications of embodied intelligence, such as human-computer interaction and virtual reality. However, existing methods that rely solely on text or initial human pose inputs struggle to capture the rich semantic understanding and interaction with the environment, and most focus on single-person motion generation, neglecting the needs of multi-person scenarios. To address these challenges, we propose the VL2Motion generation paradigm, which combines natural language instruction and environmental visual inputs to generate realistic 3D human motion. The visual inputs not only provide precise analysis of spatial layouts and environmental details but also incorporate inherent 3D spatial and world knowledge constraints to ensure that the generated motions are natural and contextually appropriate in real-world scenarios. Building on this, we introduce MMG-VL, a novel Multi-person Motion Generation approach driven by Vision and Language for generating 3D human motion in multi-room home scenarios. This approach employs a two-stage pipeline: first, it uses Vision-Language Auxiliary Instruction (VILA) module to integrate multimodal input information and generate multi-human motion instructions that align with real-world constraints; second, it utilizes Scenario-Interaction Diffusion (SID) module to accurately generate multiple human motions. Our experiments demonstrate the superiority of the VL2Motion paradigm in environmental perception and interaction, as well as the effectiveness of MMG-VL in generating multi-human motions in multi-room home scenarios. Additionally, we have released a complementary HumanVL dataset, containing 584 multi-room household images and 35,622 human motion samples, aiming to further advance innovation and development in this domain.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 428
Loading