Versatile Layout Understanding via Conjugate Graph

Hervé Déjean, Jean-Luc Meunier, Animesh Prasad, Stéphane Clinchant

12 Nov 2021 (modified: 15 Nov 2021)OpenReview Archive Direct UploadReaders: Everyone

Abstract: Recent advances in document understanding, especially text recognition, provide new opportunities to address the page segmentation problem. In this paper, we propose a method to groups text lines into semantic objects. We model a page as a graph where nodes represent text lines and the edges their geometric relations. The logical segmentation task then refers to identify all text lines belonging to some logical sub-division of the page. We model this task as categorizing edges as relevant or not to build the targeted sub-division (sub-graph). This edge categorization is performed using structured machine learning algorithms (graph Conditional Random Field and Edge Convolutional Network). We use a connected components-based approach following the edge classification for aggregating the nodes. This simple approach shows very robust results for various layout and various page sub-division. We experiment on table segmentation into multiple sub-divisions (rows, columns, and cells) and minutes segmentation into resolutions. Our sub-division and page-layout oblivious approach shows near-par performance as compared to task dedicated approaches and even outperforms them in certain setups.

0 Replies