MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
Abstract: In recent years, while generative AI has advanced significantly in image generation, video generation continues to face challenges in controllability, length, and detail quality, which hinder its application. We present MimicMotion, a framework for generating high-quality human videos of arbitrary length using motion guidance. Our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which reduces image distortion in key regions. Lastly, we propose a progressive latent fusion strategy to generate long and smooth videos. Experiments demonstrate the effectiveness of our approach in producing high-quality human motion videos. Videos and comparisons are available at [https://tencent.github.io/MimicMotion](https://tencent.github.io/MimicMotion).
Lay Summary: Generating realistic videos with AI remains challenging, particularly when creating detailed human movements over long durations. To tackle this, we developed MimicMotion, an AI framework that generates smooth, realistic videos of humans performing movements of any length. Our method uses pose templates to guide human video generation precisely, ensuring each frame stays clear and movements flow naturally. We also enhance hand generation to prevent distortion and employ a progressive blending technique to produce smoother, longer videos. This technology opens up new possibilities in areas like filmmaking and animation.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/Tencent/MimicMotion
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Human Motion Video Generation
Submission Number: 3355
Loading