Measuring the State of Document UnderstandingDownload PDF

08 Jun 2021 (modified: 24 May 2023)Submitted to NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone
Keywords: Document Understanding, Multi-modal Models, Language Models, NLP, Multimodal Data, Key Information Extraction, Question Answering, Information Extraction, Table Comprehension, KIE, NLI, Visual QA, Layout-aware Language Models
TL;DR: Description of a benchmark spanning multiple tasks related to understanding multi-modal documents with complex layouts.
Abstract: Understanding documents with rich layouts plays a vital role in digitization and hyper-automation but remains a challenging topic in the NLP research community. Additionally, the lack of a commonly accepted benchmark made it difficult to quantify progress in the domain. To empower research in Document Understanding, we present a suite of tasks that fulfill the highest quality, difficulty, and licensing criteria. The benchmark includes Visual Question Answering, Key Information Extraction, and Machine Reading Comprehension tasks over various document domains, and layouts featuring tables, graphs, lists, and infographics. The current study reports systematic baselines making use of recent advances in layout-aware language modeling. To support adoption by other researchers, both the benchmarks and reference implementations will be shortly released.
Supplementary Material: zip
4 Replies

Loading