Abstract: Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos, ultimately improving the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. Despite significant advances in deep learning-based methods, current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability. In this paper, we propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms to effectively capture and process information across different scales and domains without relying on additional priors or supervision. DHAN-SHR consists of two key components: the Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) and the Adaptive Global Dual Attention Transformer (G-DAT). The L-HD-DAT captures local inter-channel and inter-pixel dependencies while incorporating spectral domain features, enabling the network to effectively model the complex interactions between specular highlights and the underlying surface properties. The G-DAT models global inter-channel relationships and long-distance pixel dependencies, allowing the network to propagate contextual information across the entire image and generate more coherent and consistent highlight-free results. To evaluate the performance of DHAN-SHR and facilitate future research in this area, we compile a large-scale benchmark dataset comprising a diverse range of images with varying levels of specular highlights. Through extensive experiments, we demonstrate that DHAN-SHR outperforms 18 state-of-the-art methods both quantitatively and qualitatively, setting a new standard for specular highlight removal in multimedia applications. The code and dataset are available at https://github.com/CXH-Research/DHAN-SHR.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Specular highlight removal is pivotal in the multimedia domain as it directly enhances the quality, interpretability, and usability of visual content across various applications. The presence of specular highlights can significantly impair tasks such as content-based retrieval, object recognition, and scene understanding. Our research effectively removes these highlights while preserving the underlying diffuse components, thus advancing multimedia by improving the accuracy, reliability, and efficiency of content analysis and processing. The successful application of our method to diverse real-world data underscores its potential to boost user experiences and facilitate effective multimodal content management. This aligns perfectly with the MM conference's emphasis on advancing multimedia and multimodal processing techniques. Overall, our work addresses a fundamental visual content processing challenge, contributing to the development of more robust and efficient multimedia systems and aligning with the core goals of the multimedia community.
Supplementary Material: zip
Submission Number: 1735
Loading