FormCraft: Beyond Documents - Benchmarking Form Intelligence

17 Sept 2025 (modified: 02 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Form Understanding, Benchmark, Multi-modal Large Language Model
Abstract: Current document AI benchmarks have reached a plateau, primarily evaluating isolated OCR and Visual Question-Answering (VQA) tasks, failing to capture the holistic understanding required for real-world forms. We introduce FORMCRAFT, a structure-and relation-centric benchmark for form intelligence in vision–language models (VLMs). Unlike OCR or DocVQA-centric evaluations, FORMCRAFT operationalizes a three-level taxonomy: Content Modality (L1), Layout Structure (L2), and Semantic Relation (L3), into targeted tasks and structure-aware metrics. On real-world forms with professional annotation, we find that recognition is relatively strong while hierarchical reconstruction and cross-field consistency remain challenging across popular open-source and proprietary models. Upon paper acceptance, we will release the dataset and annotation schema to standardize future research.
Primary Area: datasets and benchmarks
Submission Number: 9494
Loading