CycleIE: Robust Document Information Extraction through Iterative Verification and Refinement

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information Extraction, Document Question Answering, Iterative Reasoning
Abstract: In document AI, reliable analytics require converting long, noisy (often multi-document) corpora into heterogeneous structured data—e.g. tables for numerical fields, graphs for entity–relation structures, trees for hierarchies, and faithful text chunks. Yet one-pass LLM extraction often yields incomplete or inconsistent structures because it lacks explicit verification and opportunities to revise earlier choices. We present CycleIE, an iterative information extraction (IE) framework that closes the loop between reasoning and acting by coupling ReAct with Monte Carlo Tree Search (MCTS). CycleIE employs a multi-agent workflow orchestrated through ReAct and optimized via MCTS to iteratively retrieve, structure, extract, and refine extracted content under verification guidance. This design treats extraction as a search process with feedback, enabling systematic correction of omissions and inconsistencies that defeat one-pass methods, and remains orthogonal to retrieval-augmented generation (RAG) by operating directly over user-provided documents. Experiments on challenging the document-based QA benchmark demonstrate that CycleIE delivers >10% relative improvements in extraction quality over strong one-pass baselines, with the largest gains in lengthy or multi-document contexts.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 9054
Loading