DEEN : Detection Encoder For Document Layout Anlysis

ACL ARR 2025 May Submission1700 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Document Layout Analysis is typically formulated as an object detection task. However, most existing approaches are adapted from general-purpose detection frameworks and overlook the fundamental structural differences between document and natural images. To meet the needs of human reading habits, document images are two-dimensional and free from occlusion. Based on this observation, we propose DEtection ENcoder (DEEN), which reformulates document layout analysis as a graph connectivity prediction task, thereby eliminating the need for both Non-Maximum Suppression (NMS) and confidence thresholding in post-processing. To efficiently model high-resolution feature maps, DEEN combines global sparse and local dense attention for unified representation of overall layout and fine-grained details. Since DEEN does not rely on confidence scores, we evaluate it under two settings: one that favors confidence-based models, and another that simulates real-world usage scenarios. DEEN achieves competitive performance on three structurally diverse datasets, demonstrating strong generalization.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Document Layout Analysis
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: The dataset used contains multiple languages, including English, Arabic, and Chinese.
Submission Number: 1700
Loading