Improving CTC-based Handwritten Chinese Text Recognition with Cross-Modality Knowledge Distillation and Feature Aggregation
Abstract: Offline handwritten Chinese text recognition (HCTR) models based on connectionist temporal classification (CTC) have recently achieved impressive results. Due to the conditional independence assumption and per-frame prediction characteristics, CTC-based models cannot capture semantic relationships between output tokens and leverage global visual features of characters. To solve these issues, we propose a Cross-Modality knowledge distillation approach that leverages pre-trained LM (BERT) to transfer contextual semantic information, and then design a feature aggregation module to dynamically aggregate local and global features. Experimental results on the HCTR datasets (CASIA-HWDB, ICDAR2013, HCCDOC) show that our proposed method can significantly improve the model’s performance.
Loading