DEPP: dictionary embedded probabilistic priors for scene text image super-resolution

Published: 01 Jan 2025, Last Modified: 23 Oct 2025Neural Comput. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scene text image super-resolution (STISR), often considered a preliminary step for scene text recognition, refers to the task of enhancing the resolution of text embedded in natural scene images and plays a vital role in various applications. Most of the existing STISR methods either leverage deep convolutional neural networks by regarding text images as natural scene images or use a text recognizer’s feedback as guidance to the STISR process. However, since the text recognition is initially done on low-resolution images, it is mostly inaccurate, more so as the length of the words increases, thus degrading the super-resolution process. In this paper, we introduce DEPP which utilizes dictionary embedding (DE) based probabilistic priors calculated from a large English text corpus consisting of both alphabets and digits. The initial state and the bigram probabilities obtained are fused with the probability obtained from the recognizer, before passing it onto a single image super-resolution (SISR) block. By integrating DE as a prior and implementing a modified perceptual loss, the method effectively captures the contextual information of text, enabling more accurate super-resolution and visually pleasing results. Experimental results on the benchmark TextZoom dataset demonstrate that our DEPP framework achieves superior performance compared to most existing approaches, particularly for medium and long-length words, as measured by text recognition accuracy. Since DEPP uses the text recognition attributes to rectify or guide the super-resolution process, it makes our method more domain-inspired and task-aware, compared to usual black box deep learners.
Loading