Separating Indic Scripts with 'matra' - A Precursor to Script Identification in Multi-script Documents

Sk Md Obaidullah; Chitrita Goswami; KC Santosh; Chayan Halder; Nibaran Das; Kaushik Roy

Separating Indic Scripts with 'matra' - A Precursor to Script Identification in Multi-script Documents

Sk Md Obaidullah, Chitrita Goswami, KC Santosh, Chayan Halder, Nibaran Das, Kaushik Roy

Published: 01 Jan 2016, Last Modified: 07 Nov 2024CVIP (1) 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44 % from MLP classifier.

Loading