Abstract: Recent advances in OCR have shown that an end-to-
end (E2E) training pipeline that includes both detection and
recognition leads to the best results. However, many exist-
ing methods focus primarily on Latin-alphabet languages,
often even only case-insensitive English characters. In this
paper, we propose an E2E approach, Multiplexed Multilin-
gual Mask TextSpotter, that performs script identification
at the word level and handles different scripts with differ-
ent recognition heads, all while maintaining a unified loss
that simultaneously optimizes script identification and mul-
tiple recognition heads. Experiments show that our method
outperforms the single-head model with similar number of
parameters in end-to-end recognition tasks, and achieves
state-of-the-art results on MLT17 and MLT19 joint text de-
tection and script identification benchmarks. We believe
that our work is a step towards the end-to-end trainable and
scalable multilingual multi-purpose OCR system. Our code
and model will be released.
0 Replies
Loading