Abstract: Handwritten Chinese text recognition (HCTR) is challenging due to its thousand of characters, diverse writing styles, and ambiguous segmentation. Currently, methods based on connectionist temporal classification (CTC) are widely used, while its independent decoding nature makes it unable to leverage contextual information effectively. In contrast, auto-regression based methods benefit from contextual reasoning, but only half of the contextual information is utilized due to its inherent unidirectionally decoding nature. This article proposes a multi-modal attention-based framework for offline HCTR capable of visual and semantic reasoning. Moreover, a novel mask-guided context-selective decoder is presented to guide the network to decode with randomly selected bidirectional context, further improving the semantic reasoning ability. Extensive experiments show that the proposed method significantly outperforms previous methods.
Loading