Not All Texts Are the Same: Dynamically Querying Texts for Scene Text Detection

Published: 01 Jan 2024, Last Modified: 17 May 2025PRCV (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, scene text detection has witnessed considerable advancements. However, such methods do not dynamically mine diverse text characteristics within each image to adaptively adjust model parameters, resulting in suboptimal detection performance. To address this issue, we propose a simple yet effective segmentation-based model named Text Query Detector (TQD), inspired by the recently popular transformer. TQD implicitly queries textual information and flexibly generates convolution parameters with the global receptive field. In addition, we decouple the features for parameter generation and dynamic convolution to maximize the benefits of both transformer and convolution. Extensive experiments demonstrate that our approach strikes an ideal tradeoff in terms of both accuracy and speed on prevalent benchmarks. Especially on MSRA-TD500 and ICDAR2015, our TQD achieves state-of-the-art results while maintaining high speed. Code is available at: https://github.com/TangLinJie/TQD.
Loading