Face2Diffusion for Fast and Editable Face Personalization

Kaede Shiohara, Toshihiko Yamasaki

Published: 01 Jan 2024, Last Modified: 14 Feb 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Face personalization aims to insert specific faces, taken from images, into pretrained text-to-image diffusion mod-els. However, it is still challenging for previous meth-ods to preserve both the identity similarity and editabil-ity due to overfitting to training samples. In this pa-per, we propose Face2Diffusion (F2D) for high-editability face personalization. The core idea behind F2D is that removing identity-irrelevant information from the training pipeline prevents the overfitting problem and improves ed-itability of encoded faces. F2D consists of the following three novel components: 1) Multi-scale identity en-coder provides well-disentangled identity features while keeping the benefits of multi-scale information, which im-proves the diversity of camera poses. 2) Expression guid-ance disentangles face expressions from identities and im-proves the controllability of face expressions. 3) Class-guided denoising regularization encourages models to learn how faces should be denoised, which boosts the text-alignment of backgrounds. Extensive experiments on the FaceForensics++ dataset and diverse prompts demonstrate our method greatly improves the trade-off between the identity- and text-fidelity compared to previous state-of-the-art methods. Code is available at https://github.com/mapooon/Face2Diffusion.