A Seq2seq-based Model with Global Semantic Context for Scene Text Recognition

Yi-Li Huang, Shi-Lin Wang, Chengyu Gu, Zheng Huang, Kai Chen

2021 (modified: 08 Nov 2022)DICTA 2021Readers: Everyone

Abstract: Scene text recognition (STR), with various applications, has become a popular research. With deep learning, many sequence to sequence (seq2seq) models have been proposed. However, the Teacher-Forcing training method used in the seq2seq models gave rise to the problem of exposure bias. Moreover, the autoregressive decoding manner limits seq2seq models ability of utilizing future semantic information. To solve these problems, a new Transformer-based network is proposed in this paper. A Re-Embedding Layer with sampling module is introduced to overcome the problem of exposure bias and a context fusion module (CFM) is designed to model global context information. Experiment results on several benchmarks have demonstrated the effectiveness of the proposed method in scene text recognition.

0 Replies