Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models

Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models

ICLR 2026 Conference Submission25066 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Motion generation, Motion Tracking & Transfer

TL;DR: A method to animate humanoid meshes from a text prompt by transferring motion generated by video diffusion models to the mesh.

Abstract: Animation of humanoid characters is essential in various graphics applications, but require significant time and cost to create realistic animations. We propose an approach to synthesize 4D animated sequences of input static 3D humanoid meshes, leveraging strong generalized motion priors from generative video models -- as such video models contain powerful motion information covering a wide variety of human motions. From an input static 3D humanoid mesh and a text prompt describing the desired animation, we synthesize a corresponding video conditioned on a rendered image of the 3D mesh. We then employ an underlying SMPL representation to animate the corresponding 3D mesh according to the video-generated motion, based on our motion optimization. This enables a cost-effective and accessible solution to enable the synthesis of diverse and realistic 4D animations

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 25066

Loading