Abstract: Face identity customization, i.e., face generation with specified identity, has received increasing attention owing to its extensive applications in personalized content creation. Although existing methods achieve high consistency in identity with reference faces, they still struggle to precisely manipulate fine-grained facial attributes. We attribute this issue to the inherent entanglement of identity and attribute information, as well as the lack of attribute-specific supervision. Accordingly, to address this issue, we propose AttPortrait, a high-quality identity-attribute conditional face generation framework. Based on a foundational face diffusion model, we introduce an extra disentanglement branch alongside the conventional denoising branch during the training stage. This extra branch employs explicit attribute supervision to encourage the model to capture the attribute information from the text prompts, effectively disentangling the identity and attributes and achieving precise attribute manipulation with high identity consistency. Comprehensive experiments demonstrate that our method substantially improves attribute accuracy by 34%, while maintaining identity similarity on par with state-of-the-art methods and achieving competitive FID scores across both real and synthetic datasets.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Prayag_Tiwari1
Submission Number: 8798
Loading