Evaluating Transformer-based Models in the Information Extraction of Fiscal Documents

João Macedo; Byron L. D. Bezerra; Estanislau Lima; Alysson Soares; Celso A. M. Lopes Junior; Cleber Zanchettin

Evaluating Transformer-based Models in the Information Extraction of Fiscal Documents

João Macedo, Byron L. D. Bezerra, Estanislau Lima, Alysson Soares, Celso A. M. Lopes Junior, Cleber Zanchettin

Published: 01 Jan 2023, Last Modified: 30 Sept 2024LA-CCI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The use of artificial intelligence to extract information from documents is essential for process automation and information mining. In this context, fiscal documents are one essential target due to their high volume, variety, complexity, and inconsistent document structure. In this work, we perform a comparative analysis between 3 state-of-the-art key information extraction models. We assess the performance of LAMBERT, LayoutLM, and LayoutLMv2 models considering different metrics. The models were evaluated on the CORD, Brazilian Invoice Dataset, and Brazilian Receipt Datasets. LayoutLMv2 showed the best performance but also the slower inference time and bigger model size. LayoutLM performed slight better than LAMBERT, but in most scenarios they are exchangeable performance-wise.

Loading