MoCtrl4D: Precise and Efficient Motion-Guided 4D Content Generation

Amonnut Thammatadatrakoon; Yuanhui Huang; Wenzhao Zheng; Jie Zhou; Jiwen Lu

MoCtrl4D: Precise and Efficient Motion-Guided 4D Content Generation

Amonnut Thammatadatrakoon, Yuanhui Huang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: 4D content generation, motion control

Abstract: Promptable 4D generation is a crucial task with broad applicability across industries, thus has recently gained tremendous interest in research community. However, existing works remain predominantly limited to image and text conditioning, which neglect the nuances of motion controllability. To address this, we propose to use dynamic motion prompt defined by any number of point trajectories. To translate user intention into this motion representation, we design a user-friendly interface that allows users to intuitively input motion trajectories, bringing images to life through direct interaction. Unlike prior works, in leveraging prior knowledge of a base reconstruction model, our method integrates prompts without added modules, maintaining scalability and data efficiency without overhead, achieving a full forward pass in under a second. Furthermore, instead of relying on existing appearance-focused learning frameworks, which suffers from poor motion fidelity, we design a novel physically inspired \textit{Vector Consistency Loss (VCL)} function for explicit motion learning. Our quantitative and qualitative results show significant improvement in spatiotemporally-precise and expressive control.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 4810

Loading