RoboFace: Face Restoration Made Robust via Implicit and Explicit Textual Guidance

19 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Computer Vision, Blind Face Restoration, Degradation Robustness, Textual Guidance
Abstract: Existing blind face restoration methods often struggle with out-of-distribution degradations. While high-level latent spaces like discrete codebooks offer some robustness, they frequently introduce unnatural artifacts under severe corruption; meanwhile, alternative approaches depend on costly degradation scaling and large-scale diffusion model retraining. In this paper, we propose RoboFace, a novel framework that achieves robust face restoration by prompting a pre-trained diffusion model with dual textual guidance. Given low-quality inputs, RoboFace constructs a structured, semantic-aligned space through two complementary guides: implicit guidance from CLIP latent features to preserve visual fidelity and identity, and explicit guidance from natural text prompts for flexible, user-interactive control. These guides are seamlessly integrated via a thoughtfully designed Decoupled Cross-Attention (DCA) module, which adaptively aligns them with the pretrained diffusion model. Extensive experiments demonstrate that RoboFace is exceptionally robust across a wide spectrum of degradations, delivering state-of-the-art results even on challenging low-quality surveillance faces. Our results highlight the promise of semantic guidance as a reliable and flexible paradigm for robust face restoration.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17122
Loading