Ligature-based font size independent OCR for Noori Nastalique writing style

Published: 01 Jan 2017, Last Modified: 24 Feb 2025ASAR 2017EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, a font size independent Optical Character Recognition (OCR) system for Urdu document images is presented. Urdu documents are written using Noori Nastalique writing style with different font sizes of normal text and headings. Most of current state of the art techniques of Urdu OCRs support recognition of text having single font size. The presented study deals with the recognition of Nastalique text having 14 to 28 font sizes. Three recognizers at three font sizes(called pivot) including 14, 16 and 22 are developed. Urdu document images having remaining font sizes such as 18, 20, 24, 26 and 28 are resized to the nearest pivot font size using Nearest Neighboring interpolation technique so that it can be recognized. The detailed analysis has been carried out to compute optimal scaling factor of each font size to improve recognition results. It has been observed that recognizers perform better at resized images by applying optimal scaling factors instead of simple computed scaling factors. The system is developed and matured on 1,965 main body classes covering 59,974 high frequent Urdu words. After maturation, system has 97.20%, 97.08%, 95.13%, 95.65%, 96.26%, 96.52%, 95.78%, 96.38%, 96.66% main body recognition accuracy for 14, 16, 18, 20, 24, 26, 28 font sizes respectively.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview