Abstract: Classifying historical document images is a challenging task due to the high variability of their content and the common presence of degradation in these documents. For scholars, footnotes are essential to analyze and investigate historical documents. In this work, a novel classification method is proposed for detecting and segmenting footnotes from document images. Our proposed method utilizes horizontal histograms of text lines as inputs to a 1D Convolutional Neural Network (CNN). Experiments on a dataset of historical documents show the proposed method to be effective in dealing with the high variability of footnotes, even while using a small training set. Our method yielded an overall F-measure of 56.36% and a precision of 89.76%, outperforming significantly existing approaches for this task.
0 Replies
Loading