SudokuFill: A Multi-Agent Progressive Filling Framework for Document-Level Scientific Information Extraction

SudokuFill: A Multi-Agent Progressive Filling Framework for Document-Level Scientific Information Extraction

ACL ARR 2026 January Submission8424 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scientific Information Extraction, Sudoku Filling, Multi-agent Debate

Abstract: Scientific information extraction (SciIE) is a key bottleneck for turning unstructured papers into computable knowledge bases, yet most existing systems still follow a “local extraction then global assembly” paradigm. This workflow is inherently lossy: by extracting fields in isolation, it breaks global correlations and discards high-confidence signals that could otherwise be reused as internal supervision, forcing systems to repeatedly restart from scratch, especially in long, multimodal scientific documents. In this paper, We propose a different view: SciIE should be solved as a progressive filling problem, similar to solving a Sudoku，once a field is filled with high confidence, it should act as a constraint that guides the remaining uncertain fields. Based on this idea, we introduce SudokuFill, a multi-agent framework that maintains a Global Filling State and performs priority scheduling to establish reliable anchors first, then reuses them as internal supervision for iterative deliberation over harder fields. Evaluated on a specialized document-level adjuvant dataset, our framework achieves a SOTA score of 51.83\% on our benchmark. Crucially, SudokuFill enables a 7B model to outperform the vanilla GPT-4o, proving that structured architectural reasoning can effectively compensate for parameter scale.

Paper Type: Long

Research Area: Information Extraction and Retrieval

Research Area Keywords: Information Extraction

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 8424

Loading