Exploring Agentic Multimodal Large Language Models: A Survey for AIScientists

Jinglin Jian, Yi R. (May) Fung, Denghui Zhang, Yiqian Liang, Qingyu Chen, Zhiyong Lu, Qingyun Wang

Published: 18 Nov 2025, Last Modified: 17 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: The emergence of agentic Multimodal Large Language Models (MLLMs) has catalyzed a new paradigm in scientific discovery, enabling systems to autonomously understand, reason, and act across diverse modalities. Agentic MLLMs are emerging as the next frontier for AIScientists, systems capable of assisting or even independently conducting every stage in the scientific research cycle. This paper presents a systematic taxonomy and framework for scientific MLLM agents development, covering multimodal perception, training and inference methodologies, evaluation benchmarks, and human–AIScientist collaboration. We further outline persistent challenges such as data scarcity, reliability, and interpretability, emphasizing the importance of transparent and controllable interaction frameworks. By framing agentic MLLMs as collaborative partners that augment rather than replace human scientists, we advocate for a balanced vision of AIScientists, one that advances discovery while promoting democracy and inclusivity in scientific progress.

External IDs:doi:10.36227/techrxiv.176344216.60619335/v1