A Multiplexed Network for End-to-End, Multilingual OCR

Rama Kovvuri, Jing Huang, Guan Pang, Kevin J Liang, Xi Yin, Tal Hassner

18 Nov 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: Recent advances in OCR have shown that an end-to- end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many exist- ing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilin- gual Mask TextSpotter, that performs script identification at the word level and handles different scripts with differ- ent recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and mul- tiple recognition heads. Experiments show that our method outperforms the single-head model with similar number of parameters in end-to-end recognition tasks, and achieves state-of-the-art results on MLT17 and MLT19 joint text de- tection and script identification benchmarks. We believe that our work is a step towards the end-to-end trainable and scalable multilingual multi-purpose OCR system. Our code and model will be released.

0 Replies