DQ-DETR: Dynamic Queries Enhanced Detection Transformer for Arbitrary Shape Text Detection

Published: 01 Jan 2023, Last Modified: 09 Nov 2024ICDAR (2) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a new Transformer-based text detection model, named Dynamic Queries enhanced DEtection TRansformer (DQ-DETR), to detect arbitrary shape text instances from images with high localization accuracy. Unlike previous Transformer-based methods which take all control points on the boundaries/center-lines of all text instances as the queries of each Transformer decoder layer, we extend the query set for each decoder layer gradually, allowing the DQ-DETR to achieve higher localization accuracy by detecting control points for each text instance progressively. Specifically, after refining the positions of existing control points from the preceding decoder layer, each decoder layer further appends a new point on each side of each center-line segment, which are input to the next decoder layer as additional queries for detecting new control points. As offsets from the new control points to the added reference points are small, their positions can be predicted more precisely, leading to higher center-line detection accuracy. Consequently, our DQ-DETR achieves state-of-the-art performance on five public text detection benchmarks, including MLT2017, Total-Text, CTW1500, ArT and DAST1500.
Loading