Submission Track: Findings & Open Challenges (Tiny Paper)
Submission Category: Automated Material Characterization
Keywords: Multi-agent systems, Large Language Models, Multimodal data extraction, Nanomaterials, Nanozymes, Automatic knowledge extraction
Supplementary Material: zip
TL;DR: A multi-agent system (nanoMINER) that leverages LLMs and vision models to automatically extract structured nanomaterials data from scientific literature with near-perfect precision.
Abstract: Automating structured data extraction from scientific literature is a critical challenge with broad implications across domains. We present nanoMINER, a multi-agent system that integrates large language models and multimodal analysis for scientific data extraction on nanomaterials. At its core, the ReAct agent orchestrates specialized agents to ensure comprehensive data extraction. We demonstrate its efficacy by automating the assembly of nanomaterial and nanozyme datasets, previously manually compiled by domain experts. While we achieve near-perfect extraction precision (0.98) for specific numerical parameters and excellent extraction quality for textual parameters, significant challenges remain in multimodal integration, visual data interpretation, and cross-format generalization. This paper explores the engineering complexities behind scientific data extraction systems and highlights open challenges that must be addressed to fully automate the knowledge extraction pipeline. We discuss how solving these challenges could dramatically accelerate materials discovery by eliminating manual data extraction bottlenecks and enabling truly data-driven research approaches.
Submission Number: 66
Loading