A sequence learning approach for multiple script identificationDownload PDFOpen Website

Published: 2015, Last Modified: 10 Nov 2023ICDAR 2015Readers: Everyone
Abstract: In this paper, we present a novel methodology for multiple script identification using Long Short-Term Memory (LSTM) networks' sequence-learning capabilities. Our method is able to identify multiple scripts at text-line level, where two or more scripts are present in the same text-line. Unlike traditional techniques, where either shape features or bounding boxes of individual characters are extracted, the LSTM-based system learns a particular script in a supervised learning framework. Moreover, this system neither needs specific features nor other preprocessing steps other than text-line extraction and text-line normalization. The proposed method works on text-line level, where it identifies each character as belonging to a particular script. We have developed a database consisting of English and Greek script, and our system achieved a script recognition accuracy of 98.186% on this dataset.
0 Replies

Loading