When Glass Disappears at Night: A Novel NIR-RGB Multi-modal Solution

Tao Yan; Yiwei Lu; Ke Xu; Hao Chen; Hui Li; Xiaojun Chang; Xiaojun Wu; Rynson W. H. Lau

When Glass Disappears at Night: A Novel NIR-RGB Multi-modal Solution

Tao Yan, Yiwei Lu, Ke Xu, Hao Chen, Hui Li, Xiaojun Chang, Xiaojun Wu, Rynson W. H. Lau

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Glass surface detection, multi-modal image, deep learning

TL;DR: We propose a novel approach for nighttime GSD based on the multi-modal NIR and RGB image pairs, and first construct a nighttime GSD dataset, which contains 6192 RGB-NIR image pairs captured in diverse real-world nighttime scenes.

Abstract: Glass surface detection (GSD) has recently been attracting research interests. However, existing GSD methods focus on modeling glass surface properties for daytime scenes, and can easily fail in nighttime scenes due to significant lighting discrepancies. We observe that, due to the spectral differences between Near-Infrared (NIR) light sources and common LED lights, NIR and RGB cameras capture complementary visual patterns (e.g., light reflections, shadows, and edges) of glass surfaces, and cross-comparing their lighting and reflectance information can provide reliable cues for GSD at nighttime. Inspired by this observation, we propose a novel approach for nighttime GSD based on the multi-modal NIR and RGB image pairs. We first construct a nighttime GSD dataset, which contains 6,192 RGB-NIR image pairs captured in diverse real-world nighttime scenes, with corresponding carefully-annotated glass surface masks. We then propose a novel network for the nighttime GSD task with two novel modules: (1) a RGB-NIR Guidance Enhancement (RNGE) module for extracting and enriching the NIR reflectance features with the guidance of RGB reflectance features, and (2) a RGB-NIR Fusion and Localization (RNFL) module for fusing RGB and NIR reflectance features into glass features conditioned on the multi-modal illumination discrepancy-aware features. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in nighttime scenes while generalizing well to daytime scenes. We will release our dataset and codes.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 12613

Loading