Keywords: video editing
Abstract: We present VEFlow, a novel flow-based training-free text-driven video editing framework, which enables editing the video content based on the editing prompt in an inversion-free manner. Departing from existing training-free methods that rely on an "invert-then-edit" pipeline, we build upon the flow-based generative model and derive a novel video editing flow governed by a dedicated ordinary differential equation (ODE), which transforms the source video into the target video directly within the data space, thereby eliminating the need for explicit inversion. This new paradigm enables precise control over the editing region and inspires us to develop the attention-guided flow masking (AFM) module to suppress the unintended alteration. It masks out the undesired editing flow by identifying the region where the edit occurs via the cross-attention mask extracted from the source and target editing flow estimation. Besides, we observe that the estimated video editing flow may also lead to insufficient editing results due to the conflicts between the source and target flows. To tackle this issue, we further design the decoupled flow modulation (DFM) module to mitigate the potential editing conflict and enhance the editing performance through flow projection and modulation. By incorporating these designs, our approach demonstrates significant superiority over existing methods, particularly in editing efficiency, background preservation, and content editability. Extensive experiments on real-world videos confirm the effectiveness of our approach, offering a fresh perspective on text-driven video editing.
Primary Area: generative models
Submission Number: 3461
Loading