Fine-Grained Face Sketch-Photo Synthesis with Text-Guided Diffusion Models

Published: 01 Jan 2023, Last Modified: 11 Jan 2025ACPR (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Face sketch-photo synthesis involves generating face photos from input face sketches. However, existing Generative Adversarial Networks (GANs)-based methods struggle to produce high-quality images due to artifacts and lack of detail caused by training difficulties. Additionally, prior approaches exhibit fixed and monotonous image styles, limiting practical usability. Drawing inspiration from recent successes in Diffusion Probability Models (DPMs) for image generation, we present a novel DPMs-based framework. This framework produces detailed face photos from input sketches while allowing control over facial attributes using textual descriptions. Our framework employs a U-Net, a semantic sketch encoder for extracting information from input sketches, and a text encoder to convert textual descriptions into text features. Furthermore, we incorporate a cross-attention mechanism within the U-Net to integrate text features. Experimental results demonstrate the effectiveness of our model, showcasing its ability to generate high-fidelity face photos while surpassing alternative methods in qualitative and quantitative evaluations.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview