Keywords: Face Stylization, Diffusion Model, Identity Preservation
Abstract: The canonical challenge in face stylization lies in disentangling high-level semantic
content, such as identity, from low-level stylistic attributes. Prevailing methods,
including recent diffusion-based models, often fail to achieve a robust separation,
resulting in an undesirable trade-off between style fidelity and content preservation.
To address these challenges, we introduce **StyleFace**, a novel framework that
treats face stylization as a targeted statistical transfer within a disentangled feature
space. Our approach is a cohesive pipeline that begins with a disentangled attention
module, which orthogonally projects content and style information into separate,
controllable embeddings. This separation is critical, enabling our method’s core:
a statistical style injection layer that manipulates feature distributions to preserve
identity while implanting style. To guide this transfer and ensure global coherence,
the entire process is optimized using a perceptually-aligned adversarial objective
that operates not on raw pixels, but on the high-level feature manifold of a Vision
Transformer (ViT), enforcing perceptual and stylistic consistency. This synergistic
design allows StyleFace to achieve an unprecedented balance between identity
preservation and style fidelity, with comprehensive experiments demonstrating that
our model consistently outperforms state-of-the-art methods
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4432
Loading