Scene Text Recognition with Heuristic Local Attention

Published: 01 Jan 2022, Last Modified: 19 Jul 2024IEEE Big Data 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Scene text recognition is considered as a sequence labeling problem. For the text recognition task, the alignment between the scene text image and the output text is coincident, which means the latter characters corresponding to the image region will also be behind. However, the existing global attention-based method focuses too much irrelevant information which leads to alignment drift. Contrary, local attention selects the subset of feature representation most relevant to the current character. In this paper, we explore the local attention mechanism and attempt to replace the global attention to implement decoding. Therefore, we revise several variants of local attention methods and provide a comprehensive comparison, which is missing in the scene text recognition literature so far. Specially, we introduce two Heuristic approaches for Local Attention (HLA) and prove that monotonic alignment improves performance significantly. Evaluations on the benchmarks show that the local attention method outperforms the existing global attention methods.
Loading