Optical Character Recognition (OCR) for Telugu: Database, Algorithm and Application

Chandra Prakash Konkimalla, Srikar Yellapragada

Published: 06 Sept 2018, Last Modified: 30 Sept 2024Athens, GreeceEveryoneRevisionsCC BY 4.0

Abstract: Telugu is a Dravidian language spoken by more than 80 million people worldwide. The optical character recognition (OCR) of the Telugu script has wide ranging applications including education, health-care, administration etc. The beautiful Telugu script however is very different from Germanic scripts like English and German. This makes the use of transfer learning of Germanic OCR solutions to Telugu a non-trivial task. To address the challenge of OCR for Telugu, we make three contributions in this work: (i) a database of Telugu characters, (ii) a deep learning based OCR algorithm, and (iii) a client server solution for the online deployment of the algorithm. For the benefit of the Telugu people and the research community, our code has been made freely available at https://gayamtrishal.github.io/OCR Telugu.github.io/.