HuMouS: Human Motion Synthesis with Fine-Grained Control using Latent Space Manipulation of Cycle-Consistent Diffusion Models
Keywords: 3d humans, human motion synthesis
TL;DR: A novel method for controlled human motion synthesis
Abstract: We address the problem of spatially guided text-to-motion synthesis. While there has been work to incorporate spatial constraints in text-to-motion diffusion models, existing methods still face significant challenges in generating motions that align with the conditional controls. To this end, we propose Cycle Consistent Diffusion, a novel approach that improves controllable generation by explicitly optimizing frame-level cycle consistency between generated motions and conditional controls. Specifically, for an input conditional control, we ensure that the output motion and the input spatial constraint are forced to be consistent. A straightforward implementation though consistent with the input often does not match fine-grained control signals. To this end, we introduce a novel test-time optimization framework that directs our pre-trained cycle consistent diffusion model towards user-defined sparse constraints. We demonstrate approximately 5 to 10 percent improvement in controllability of motion synthesis on the HumanML3D dataset, while significantly reducing foot skating artifacts.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 913
Loading