MPGTSRN: Scene Text Image Super-Resolution Guided by Multiple Visual-Semantic Prompts

Mingjun Li, Zeming Zhuang, Shuo Xu, Feng Su

Published: 2024, Last Modified: 05 Mar 2025ICPR (32) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Scene text image super-resolution (STISR) aims at enhancing the visual clarity of a low-resolution text image for human perception or tasks like text recognition. In recent STISR work, various visual and semantic clues of the text play a key role in recovering the details of the text, but the utilization of different clues and their interactions is still insufficient, which often results in distorted or blurred appearances of the reconstructed text. To address this problem, we propose a multi-prompt guided text image super-resolution network (MPGTSRN). Specifically, we introduce multiple visual prompts for the text and combine them with semantic features to comprehensively capture the diverse characteristics of the text. We then propose a recurrent reconstruction network integrating multiple visual-semantic prompts to enhance the representation of the text and yield a high-resolution text image. We further propose a cross-representation attention mechanism that utilizes the complementarity of different prompts to guide the reconstruction network to adaptively focus on salient parts of the text and effectively improves the text details. The experimental results show the superiority of our proposed MPGTSRN in the STISR task.