$S^3$-Bench: A Comprehensive Study of Multimodal LLMs for Scientific Discovery with Benchmarking

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: MLLMs, AI for Science, Survey, Benchmark, drug discovery, molecular & protein design, materials science, genomics.
TL;DR: A work to systematically review the progress of MLLMs in key scientific domains, including drug discovery, molecular & protein design, materials science, and genomics.
Abstract: Recent advances in artificial intelligence (AI), especially large language models, have accelerated the integration of multimodal data in scientific research. Given that scientific fields involve diverse data types, ranging from text and images to complex biological sequences and structures, multimodal large language models (MLLMs) have emerged as powerful tools to bridge these modalities, enabling more comprehensive data analysis and intelligent decision-making. This work, $\text{S}^3\text{-Bench}$, provides a comprehensive overview of recent advances in MLLMs, focusing on their diverse applications across science. We systematically review the progress of MLLMs in key scientific domains, including drug discovery, molecular \& protein design, materials science, and genomics. The work highlights model architectures, domain-specific adaptations, benchmark datasets, and promising future directions. More importantly, we also conducted benchmarking evaluations of open-source models on several highly significant tasks, such as molecular property prediction and protein function prediction. Our work aims to serve as a valuable resource for both researchers and practitioners interested in the rapidly evolving landscape of multimodal AI for science.
Primary Area: datasets and benchmarks
Submission Number: 10494
Loading