A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

Koushik Roy, Md. Sazzad Hossain, Pritom Kumar Saha, Shadman Rohan, Imranul Ashrafi, Ifty Mohammad Rezwan, Fuad Rahman, B. M. Mainul Hossain, Ahmedul Kabir, Nabeel Mohammed

Published: 2024, Last Modified: 25 Apr 2026Int. J. Document Anal. Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Bangla Optical Character Recognition (OCR) poses a unique challenge due to the presence of hundreds of diverse conjunct characters formed by the combination of two or more letters. In this paper, we propose two novel grapheme representation methods that improve the recognition of these conjunct characters and the overall performance of OCR in Bangla. We have utilized the popular Convolutional Recurrent Neural Network architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Experimental results proved the effectiveness of our novel approach. Furthermore, our synthetically generated training dataset and the test protocols are made available to serve as benchmarks for future Bangla OCR research.