Abstract: In this paper, we present a novel approach based on convolutional neural networks (CNNs) to estimate the paper format (pixels per inch) of digitized document images. This format information is often required by commercial document analysis software. A correct estimation of format helps high-level tasks such as OCR and layout analysis. The contribution of this work is two-fold: First, it presents an algorithm for the estimation of paper formats. Second, it is the first publicly available collection of documents (aggregated from public datasets) useful as research benchmark. The collection is a mixture of modern and historical documents with a Pixel Per Inch (PPI) value range from 177 up to 711. The task is modeled as a regression task, leading to more flexible results than in a classification task (one class per format, e.g., A3, A4). For example, if an unknown format is presented to the network, it returns a useful output. Furthermore, more categories can be easily learned by curriculum learning without modifying the network structure itself. On the proposed dataset, the network is able to estimate the PPI values with only an average deviation (from the ground truth) of 14.8 PPI. On a private dataset, stemming from health insurance companies, an average deviation of 6.8 PPI points has been calculated.
0 Replies
Loading