Abstract: Scene text recognition (STR), with various applications, has become a popular research. With deep learning, many sequence to sequence (seq2seq) models have been proposed. However, the Teacher-Forcing training method used in the seq2seq models gave rise to the problem of exposure bias. Moreover, the autoregressive decoding manner limits seq2seq models ability of utilizing future semantic information. To solve these problems, a new Transformer-based network is proposed in this paper. A Re-Embedding Layer with sampling module is introduced to overcome the problem of exposure bias and a context fusion module (CFM) is designed to model global context information. Experiment results on several benchmarks have demonstrated the effectiveness of the proposed method in scene text recognition.
0 Replies
Loading