InpaintFormer: Prompt-guided High-Quality Face Inpainting with Mask-Aware Self-Attention

Zhouhao Ouyang, Wen Xue, Tianyi Chen, Yan Huang, Si Wu, Yong Xu, Patrick Le Callet, Dapeng Oliver Wu

Published: 2025, Last Modified: 12 Nov 2025ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Face image inpainting, especially with user-controllable customization, aims to restore degraded facial regions while adhering to user-provided instructions. Traditional inpainting methods often focus solely on restoring visual fidelity, lacking the ability to incorporate user prompts or semantic guidance. In this work, we present InpaintFormer, a novel framework for user-controlled face image inpainting guided by textual prompts. Specifically, we propose a Prompt-guided Feature Modulation (PGFM) module to align visual features with user instructions by utilizing a pre-trained CLIP model to extract text and image embeddings. These embeddings are fused to modulate the encoded image features, ensuring semantic consistency with the prompt. Additionally, a Degradation Mask Predictor (DMP) is introduced to identify degraded regions requiring inpainting, while a Mask-Aware Self-Attention (MASA) mechanism within the Transformer refines the inpainting process by selectively attending to non-degraded regions for generating realistic results. By combining PGFM, DMP, and MASA, InpaintFormer enables controllable face image inpainting with high fidelity and semantic alignment. Extensive experiments demonstrate that InpaintFormer outperforms state-of-the-art inpainting methods in terms of controllability and naturalness.

External IDs:dblp:conf/icmcs/OuyangXCHWXCW25