Keywords: FaceVideo Editing, Face Image Editing, Precision Guidance
Abstract: Preserving identity while precisely manipulating attributes is a central challenge
in face editing for both images and videos. Existing methods often introduce visual artifacts or fail to maintain temporal consistency. We present **FlowGuide**,
a unified framework that achieves fine-grained control over face editing in diffusion models. Our approach is founded on the local linearity of the UNet bottleneck’s latent space, which allows us to treat semantic attributes as corresponding
to specific linear subspaces, providing a mathematically sound basis for disentanglement. FlowGuide first identifies a set of orthogonal basis vectors that span
these semantic subspaces for both the original content and the target edit, a representation that efficiently captures the most salient features of each. We then
introduce a novel guidance mechanism that quantifies the geometric alignment
between these bases to dynamically steer the denoising trajectory at each step.
This approach offers superior control by ensuring edits are confined to the desired
attribute’s semantic axis while preserving orthogonal components related to identity. Extensive experiments demonstrate that FlowGuide achieves state-of-the-art
performance, producing high-quality edits with superior identity preservation and
temporal coherence. Our code is available at: https://github.com/yl4467/flow_edit.
Primary Area: generative models
Submission Number: 3400
Loading