Multi-Level Graph Convolutional Network for Document Information Extraction

Published: 01 Jan 2024, Last Modified: 11 Apr 2025ICTAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Document information extraction aims to identify and extract entities and other essential information from documents. Its performance is significantly dependent on the learned relationship between the text and its corresponding layout, which is not fully exploited by existing methods. Therefore, we propose a multi-level graph convolutional network to improve the capability for modeling document information. Firstly, we integrate text embeddings with corresponding image and layout features as text features, which are used to construct a text-level graph to extract semantic relationships as word features through graph convolutions. We then construct a layout-level graph using region layout features as nodes, extracting structure relationships through further graph convolutions. This multi-level graph structure allows our model to fuse fine-grained word features with coarse-grained region features for effective sequence labeling. Comprehensive experiments on various datasets consistently demonstrate the effectiveness of our method, and the results show that our model outperforms the state-of-the-art methods with the aid of multi-level features.
Loading