Segmentation-Free Printed Traditional Mongolian OCR Using Sequence to Sequence with Attention Model

Published: 01 Jan 2017, Last Modified: 17 Apr 2025ICDAR 2017EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Mongolian Optical Character Recognition (OCR) systems are required for printed document digitization and Mongolian cultural resources utilization. Existing Mongolian OCR systems are based on segmentation. But, the Mongolian segmentation is more difficult than other languages. So, these methods are highly costly and error suffering. In this study, a segmentation-free based traditional Mongolian word recognition method is proposed. Specifically, we formalize the OCR task as a sequence to sequence mapping problem, in which the input Mongolian word image and the output textual string are treated as a sequence of image frames and a sequence of letters, respectively. A sequence to sequence with attention model is adopted to solve this problem. Experimental results on a dataset show the effectiveness of the proposed method.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview