Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation

Published: 01 Jan 2022, Last Modified: 11 Apr 2025ICANN (3) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, segmentation-based methods have quickly become the mainstream in scene text detection, owing to their precise description of arbitrary-shape texts. However, the reduced inference speed hinders the practical application of segmentation-based methods. In this paper, we propose an efficient and accurate arbitrary-shaped text detector named ViT-Bilateral DBNet, which improves the efficiency of feature processing approach to achieve a good trade-off between accuracy and real-time performance. Specifically, we first combine Differentiable Binarization (DB) with real-time semantic segmentation BiSeNet V2 which is more suitable to process features for segmentation-based methods. Then three improvements are proposed to optimize the initial integrated network. ViT-Bilateral Network can strengthen the feature extracting capability of neural networks. Attention-driven Aggregation Layer (AAL) can adaptively fuse the details and the semantics achieved by ViT-Bilateral Network. Meanwhile, the auxiliary loss is added to make the training more sufficient. Compared with original DBNet, our method not only gains 1.17% (on IC15) and 1.34% (on CTW 1500) improvements, but also runs 1.38 times and 1.34 times faster. Notably, our detector surpasses the previous best record and maintains a high inference speed.
Loading