Diff-ID: Identity-Consistent Facial Image Generation and Morphing via Diffusion Models

Diff-ID: Identity-Consistent Facial Image Generation and Morphing via Diffusion Models

ICLR 2026 Conference Submission20320 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models, Identity Preservation, Adapters, CLIP, Arcface, Generative Models

Abstract: Generative diffusion models have revolutionized facial image synthesis, yet robust identity preservation in high-resolution outputs remains a critical challenge. This issue is especially vital for security systems, biometric authentication, and privacy-sensitive applications, where any drift in identity integrity can undermine trust and functionality. We introduce Diff-ID, a diffusion-based framework that enforces identity consistency while delivering photorealistic quality. Central to our approach is a custom 210K image dataset synthesized from CelebA-HQ, FFHQ, and LAION-Face and captioned via a fine-tuned BLIP model to bolster identity awareness during training. Diff-ID integrates ArcFace and CLIP embeddings through a dual cross-attention adapter within a fine-tuned Stable Diffusion U-Net. To further reinforce identity fidelity, we propose a pseudo-discriminator loss based on ArcFace cosine similarity with exponential timestep weighting. Experiments on held-out and unseen faces demonstrate that Diff-ID outperforms state-of-the-art methods in both identity retention and visual realism. Finally, we showcase a unified DDIM-based morphing pipeline that enables seamless facial interpolation without per-identity fine-tuning.

Primary Area: generative models

Submission Number: 20320

Loading