Pose-guided Motion Diffusion Model for Text-to-motion Generation

Xinhao Cai; Minghang Zheng; Qingchao Chen; Yuxin Peng; Yang Liu

Pose-guided Motion Diffusion Model for Text-to-motion Generation

Xinhao Cai, Minghang Zheng, Qingchao Chen, Yuxin Peng, Yang Liu

26 Sept 2024 (modified: 15 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Motion generation, Text to motion, Diffusion model, Generative model

Abstract: 3D Human motion generation, especially textual conditioning motion generation, is a vital part of computer animation. However, during training, multiple actions are often coupled within a single textual description, which complicates the model's learning of individual actions. Additionally, the motion corresponding to a given text can be diverse, which makes it difficult for the model learning and for the user to control the generation of motions that contain a specific pose. Finally, motions with the same semantics can have various ways of expression in the forms of texts, which further increases the difficulty of the model’s learning process. To solve the above challenges, we propose the Pose-Guided Text to Motion (PG-T2M) with the following designs. Firstly, we propose to divide the sentences into sub-sentences containing one single verb and make the model learn the specific mapping from one single action description to its motion. Secondly, we propose using pose priors from static 2D natural images for each sub-sentence as control signals, allowing the model to generate more accurate and controllable 3D pose sequences that align with the sub-action descriptions. Finally, to enable the model to distinguish which sub-sentences describe similar semantics, we construct a pose memory storing semantic-similar sub-sentences and the corresponding pose representations in groups. These designs together enable our model to retrieve the pose information for every single action described in the text and use them to guide motion generation. Our method achieves state-of-the-art performance on the HumanML3D and KIT datasets.

Supplementary Material: zip

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5449

Loading