Facial Highlight Removal With Cross-Context Attention and Texture Enhancement

Published: 2025, Last Modified: 18 Jan 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Facial highlight removal aims to identify and remove the specular highlight components in the facial image, ensuring that the generated image has a consistent facial tone and high-fidelity texture detail. Existing methods struggle to remove the highlight and recover the details in disturbed areas simultaneously, often resulting in specular residues or distorted local details (i.e. texture, illumination, and color). To rectify these issues, this work proposes a novel two-stage facial highlight removal network (FHR-Net), which mainly consists of a Cross-Context Attention Module (CCAM) and a Texture Enhancement Module (TEM). In the first stage, according to the detected highlight mask, the CCAM explicitly integrates cross-context information to obtain coarse highlight removal results consistent with the surrounding facial context. Building upon the coarse result, the TEM in the second stage utilizes patch-wise attention to refine the texture details in the highlight areas, thereby producing a high-fidelity facial image. To improve coherence between the removed highlight areas and non-highlight areas, this work introduces a face feature loss that makes the processed highlight-disturbed areas align well with the surrounding facial architecture. Additionally, to address the lack of high-quality datasets in the research community and satisfy the training demands for data-driven facial highlight removal, this work builds a real-world Paired Facial Specular-Diffuse (PFSD) dataset through cross-polarization. Experimental results on PFSD and other datasets demonstrate that FHR-Net can effectively remove the facial highlight and recover original color and texture details.
Loading