Keywords: 4D content generation, motion control
Abstract: Promptable 4D generation is a crucial task with broad applicability across industries, thus has recently gained tremendous interest in research community. However, existing works remain predominantly limited to image and text conditioning, which neglect the nuances of motion controllability. To address this, we propose to use dynamic motion prompt defined by any number of point trajectories.
To translate user intention into this motion representation, we design a user-friendly interface that allows users to intuitively input motion trajectories, bringing images to life through direct interaction. Unlike prior works, in leveraging prior knowledge of a base reconstruction model, our method integrates prompts without added modules, maintaining scalability and data efficiency without overhead, achieving a full forward pass in under a second. Furthermore, instead of relying on existing appearance-focused learning frameworks, which suffers from poor motion fidelity, we design a novel physically inspired \textit{Vector Consistency Loss (VCL)} function for explicit motion learning.
Our quantitative and qualitative results show significant improvement in spatiotemporally-precise and expressive control.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4810
Loading