Automatic Selection of Binarization Method for Robust OCR

Tanushyam Chattopadhyay, V. Ramu Reddy, Utpal Garain

2013 (modified: 09 Nov 2021)ICDAR 2013Readers: Everyone

Abstract: Many algorithms are now available for doing the same task (e.g. binarization, page segmentation, character recognition, etc.) in document image analysis (DIA) and choosing a particular algorithm(s) for a particular task is often a non-trivial problem. This paper proposes a model for automatically selecting the correct algorithm(s) for a given problem. Binarization has been taken a reference to illustrate the proposed approach. Several previously unexplored issues are addressed in this work. For example, only one method may not be good for the binarization of an entire document whereas a particular method may produce desired result for a particular region. Therefore, for a given document image, our model selects a set of one or more binarization techniques suitable for different regions of the document. This selection is completely automatic and guided by the machine learning approaches. Formulation of a completely automatic way for generating the annotated data for training the learning algorithms is also a novel contribution of this work. Evaluation of the approach is done using ICDAR 2003 Robust Reading data set and results highlight the potential of the proposed approach for automatic selection of correct DIA algorithm(s) from a set of several alternatives.

0 Replies