Seeing Text in the Dark: Algorithm and Benchmark

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detection, LLE is primarily designed for human vision rather than machine vision and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in the dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Visual degradations in low-light environments significantly challenge the perception of multimedia information. Text localization is a crucial upstream task for multimedia understanding and interpretation. While traditional low-light image enhancement techniques primarily focus on amplifying visual signals in images, localizing text in dark conditions presents a unique challenge. It requires processing multimodal information, which combines the rich visual and semantic features of text found in natural scenes. Our research takes an innovative approach to text localization under such conditions. We developed a text detector that departs from conventional low-light image enhancement methods, instead leveraging the intrinsic visual and contextual knowledge of text. This strategy not only enhances detection accuracy in low-light environments but also significantly fills a gap in this research area. Additionally, we introduce the first low-light dataset for arbitrary-shaped text, featuring diverse scenes and languages. This dataset is a vital resource for advancing research in multimedia understanding and interpretation under extreme low-light conditions, providing a specialized dataset for addressing these complex challenges.
Supplementary Material: zip
Submission Number: 1675
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview