AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Hao-Yu Hsu; Zhi-Hao Lin; Albert J. Zhai; Hongchi Xia; Shenlong Wang

AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Hao-Yu Hsu, Zhi-Hao Lin, Albert J. Zhai, Hongchi Xia, Shenlong Wang

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual Effects, Text-guided Video Editing, Scene Simulation, Physical Simulation, LLM Agent, Object Insertion, Material Editing

Abstract: Modern visual effects (VFX) software has made it possible for skilled artists to create imagery of virtually anything. However, the creation process remains laborious, complex, and largely inaccessible to everyday users. In this work, we present AutoVFX, a framework that automatically creates realistic and dynamic VFX videos from a single video and natural language instructions. By carefully integrating neural scene modeling, LLM-based code generation, and physical simulation, AutoVFX is able to provide physically-grounded, photorealistic editing effects that can be controlled directly using natural language instructions. We conduct extensive experiments to validate AutoVFX's efficacy across a diverse spectrum of videos and instructions. Quantitative and qualitative results suggest that AutoVFX outperforms all competing methods by a large margin in generative quality, instruction alignment, editing versatility, and physical plausibility.

Supplementary Material: zip

Submission Number: 107

Loading