BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Timo I. Denk; Christian Reisswig

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Timo I. Denk, Christian Reisswig

Published: 01 Nov 2019, Last Modified: 18 May 2025DI 2019Readers: Everyone

Keywords: document representation, contextualized embedding, information extraction from documents, tabulated data recognition and extraction, document intelligence

Abstract: For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction.

TL;DR: Grid-based document representation with contextualized embedding vectors for documents with 2D layouts

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/bertgrid-contextualized-embedding-for-2d/code)

1 Reply

Loading