TPANet: Scene Text Detection Based on Texture Refinement and Patch-Driven Attention with Cross-Level Feature Integration

Published: 2025, Last Modified: 02 Apr 2026ICIC (3) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the rapid development of scene text detection technology, it has been widely used in various real-world scenarios such as road signs, billboards, license plates, and digital documents, and has achieved remarkable results. However, existing text detection methods still face challenges: 1) Many detection methods have difficulty in effectively dealing with significant scale changes in text, especially for small-scale text, where size differences are usually difficult to capture. 2) Traditional detection models still face great challenges when detecting various forms of text, such as curved or rotated text. To overcome these problems, this paper proposes a scene text detection framework that adaptively fuses feature pyramids. First, a Texture Refinement Unit (TRU) is used to enhance the texture representation of text and retain fine details. In addition, a Cross-Level Feature Integration Module (CFIM) is used to selectively fuse features at different stages to refine text features at different levels and reduce the interference of background noise. In addition, a Patch-Driven Parallel Attention Module (PPAM) is designed to enhance feature extraction by dynamically adjusting the receptive field and selectively emphasizing key text areas. Experimental results show that the comprehensive F-measure is improved by 4.0% and 1.8% on the ICDAR 2015 and Total-Text datasets, respectively, verifying its effectiveness in text detection in complex scenes.
Loading