Abstract: Key Information Extraction (KIE) from visually rich documents (VRDs) is typically framed as either fine-grained token classification or coarse-grained entity retrieval. Token-level models effectively capture spatial and visual information associated with document spans, while entity-level models excel in modeling logical dependencies and align more closely with real-world use cases.
This work introduces PM3-KIE, a probabilistic, multi-task meta-model that integrates fine-grained and coarse-grained approaches, leveraging the strengths of both paradigms. The proposed model introduces two key innovations: domain-specific schema constraints to enforce logical consistency and mitigate extraction errors, and the integration of large language models (LLMs) to validate extractions through semantic plausibility.
Experimental evaluation on the public VRDU dataset demonstrates that PM3-KIE significantly outperforms three state-of-the-art models and a stacked ensemble, achieving a 2.5% improvement in F1 score, highlighting the model's efficacy in unifying fine- and coarse-grained representations for enhanced KIE performance.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Information Extraction, NLP Applications, Machine Learning for NLP
Contribution Types: NLP engineering experiment, Data analysis, Theory
Languages Studied: english
Submission Number: 439
Loading