FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation

Published: 18 Sept 2025, Last Modified: 18 Sept 2025LSRW PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Trajectory-to-Video Generation, Robot Video Generation, Visual Policy Learning, Training-free Techniques
TL;DR: We introduce two training-free, inference-time controls, action-scaled CFG and noise truncation, that steer diffusion by action magnitude, improving action coherence and visual quality on real-robot manipulation datasets.
Abstract: Generating realistic robot videos from explicit action trajectories is a critical step toward building effective world models and robotics foundation models. We introduce two training-free, inference-time techniques that fully exploit explicit action parameters in diffusion-based robot video generation. Instead of treating action vectors as passive conditioning signals, our methods actively incorporate them to guide both the classifier-free guidance process and the initialization of Gaussian latents. First, action-scaled classifier-free guidance dynamically modulates guidance strength in proportion to action magnitude, enhancing controllability over motion intensity. Second, action-scaled noise truncation adjusts the distribution of initially sampled noise to better align with the desired motion dynamics. Experiments on real robot manipulation datasets demonstrate that these techniques significantly improve action coherence and visual quality across diverse robot environments.
Serve As Reviewer: ~Seungwook_Kim2
Submission Number: 3
Loading