Abstract: Recent advances in document understanding, especially text recognition, provide new opportunities to address the page segmentation problem. In this paper, we propose a method to groups text lines into semantic objects. We model a page as a graph where nodes represent text lines and the edges their geometric relations. The logical segmentation task then refers to identify all text lines belonging to some logical sub-division of the page. We model this task as categorizing edges as relevant or not to build the targeted sub-division (sub-graph). This edge categorization is performed using structured machine learning algorithms (graph Conditional Random Field and Edge Convolutional Network). We use a connected components-based approach following the edge classification for aggregating the nodes. This simple approach shows very robust results for various layout and various page sub-division. We experiment on table segmentation into multiple sub-divisions (rows, columns, and cells) and minutes segmentation into resolutions. Our sub-division and page-layout oblivious approach shows near-par performance as compared to task dedicated approaches and even outperforms them in certain setups.
0 Replies
Loading