Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents

Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents

ICLR 2026 Conference Submission16868 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data-driven Scientific Discovery, LLM Agent

Abstract: Recent advances in large language models (LLMs) have enabled agents that automate scientific discovery by interpreting data, generating analysis pipelines, and executing them with computational tools. However, existing benchmarks remain largely limited to unimodal datasets and slice-level tasks, overlooking the fact that real discovery requires multimodal integration, modeling, and hypothesis-driven reasoning. To address this gap, we introduce MoSciBench, the first benchmark for multimodal scientific discovery, constructed from peer-reviewed studies through a principled four-stage pipeline. MoSciBench spans six scientific domains, seven data modalities, and five categories of discovery questions, yielding 88 individual, end-to-end, data-driven tasks. Each task is designed as a cross-modal hypothesis verification workflow, requiring agents to align and integrate heterogeneous datasets before modeling and reasoning. We further evaluate four representative agent frameworks across multiple LLM families. Results show that multimodal discovery is substantially harder than unimodal tasks: even the strongest agents achieve only 48.94\% accuracy, with over 60\% of failures due to cross-modal alignment. Lightweight workflow scaffolding consistently improves performance, reducing alignment errors by 5–10\% and raising accuracy by 5.7\% on average. Our benchmark and evaluation framework thus establish a rigorous testbed for advancing LLM agents toward realistic, multimodal scientific discovery.

Primary Area: datasets and benchmarks

Submission Number: 16868

Loading