Arbitrary-Shaped Scene Text Recognition with Deformable Ensemble Attention

Published: 01 Jan 2024, Last Modified: 05 Mar 2025ICPR (31) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scene text recognition (STR) is a challenging task that aims to automatically localize and recognize text in varied natural scenes. Although the performance of STR methods has been significantly improved, the STR problem is far from being solved, especially when dealing with text with complex shapes and intricate backgrounds. To increase the accuracy of the STR model for arbitrary-shaped text and robustness to interferences such as noises and adjacent objects, we propose a novel deformable ensemble attention model and a scene text recognition network DEATRN based on it. The attention model combines the flexibility of an ensemble of deformable 2D local attentions for retrieving discriminative features of characters and the constraints on the regularity of the overall shape of a text depicted by its parametric centerline, which effectively enhances the text recognition performance of DEATRN. We also propose effective text geometry-based loss terms to improve the accuracy of attention. The experimental results show the superiority of DEATRN in recognizing arbitrary-shaped text in real scenarios.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview