Data-driven Design as a High-Impact, Ecologically Valid Benchmark for Document Understanding

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 2: Dataset Proposal Competition
Keywords: Materials science, data-driven design, document understanding, information extraction
TL;DR: Data-driven design is a task that is both impactful for materials discovery, and sits at the frontier of document-based AI capabilities.
Abstract: Data-driven design (DDD) is viewed in materials science as a promising avenue to accelerate materials discovery by narrowing the search space for candidate materials with desirable properties, and relies on correctly-extracted information from prior literature. Existing methods for DDD-related information extraction, however, rely on either laborious, hand-engineered pipelines, or the annotation of significant amounts of hard-to-collect data. We therefore propose DDD as a benchmark for zero- and few-shot document understanding focused on text, tables, and charts. Accurate generalization to new, unseen material domains is a way to accelerate scientific discovery by enabling the use of DDD in previously unexplored domains.
Submission Number: 209
Loading