Scene text image super-resolution with semantic-aware interaction

Published: 2025, Last Modified: 23 Sept 2025Eng. Appl. Artif. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scene text image super-resolution aims to enhance the resolution of images containing text in various scenes, which amplifies the prominence of the text and improves its recognizability. Existing methods struggle to accurately localize text regions in high-noise environments, which hampers their ability to effectively implement targeted super-resolution. To address these issues, we propose a Text image Super-resolution Semantic-aware Interaction Network (TSSIN) by embedding a text region segmentation network. First, we propose to use a pre-trained text region segmentation network (TRSN) to extract text region information. This approach semantically guides our model to address the challenge of the model not performing targeted super-resolution processing of text in high-noise environments. Secondly, we propose a multi-modal semantic information interaction module (MSIIM) to mitigate the issue of insufficient global information exchange. Comprehensive experiments conducted on the TextZoom dataset demonstrate that our TSSIN significantly enhances image quality. Furthermore, it shows a clear superiority over state-of-the-art methods on TextZoom, achieving an average text recognition accuracy improvement of +1.0% over Transformer-Based Super-Resolution Network (TBSRN) (49.6%, 56.2%, 60.1% vs. 48.1%, 55.9%, 58.9%), +0.83% over Parallelly Contextual Attention Network (PCAN), and +1.46% over Text Prior Guided Super-Resolution (TPGSR), as evaluated by three pre-trained text recognition models. Code is available at https://github.com/ads2d/TSSIN.
Loading