Abstract: Automatic document authentication is a complex task. The aim is to prove that the document at hand is not a fraudulent one. This can be achieved through a fingerprint that is based on the document’s content. To this end, it is necessary to analyze and describe the different constituent elements of the document: graphics, text, tables, as well as the layout. In this context, this article focuses on layout description and authentication. The Delaunay layout descriptor Eskenazi et al. 2015 is a robust descriptor allowing the fast comparison and authentication of layouts based on the spatial relationships of the regions composing the document. As the page layout description needs a segmentation of the document into regions, the Delaunay layout descriptor does not allow to match an authentic copy with the original when the number of segmented regions is different for both documents. This is mainly due to the use of a global matching approach. To overcome this drawback, we present a new refined matching algorithm for the Delaunay layout descriptor, which combines global and local matching. Furthermore, we present a storage and retrieval scheme to match a Delaunay layout descriptor efficiently with a layout database. In addition to its ability of comparing layouts with a different number of segmented regions, the proposed method outperforms related work. We obtain respectively a false negative and false positive rate of 0.011 and 0.0 for a data set of printed and scanned layouts, and of 0.3978 and 0.0029 for a data set of real documents.
Loading