Deeper comprehension of Visually-Rich Document Understanding: key insights, challenges, and future directions

ACL ARR 2025 February Submission1886 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The field of visually-rich document understanding, which involves interacting with visually-rich documents (whether scanned or born-digital), is rapidly evolving and still lacks consensus on several key aspects of the processing pipeline. In this work, we provide a comprehensive overview of state-of-the-art approaches, emphasizing their strengths and limitations, pointing out the main challenges in the field, and proposing promising research directions.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Visually Rich Documents Understanding, Document AI, VRDU, VRD
Contribution Types: Surveys
Languages Studied: English
Submission Number: 1886
Loading