Keywords: Molecular Grounding, Language-Structure Reasoning, AI for Science
Abstract: Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect, i.e., linking molecular concepts to specific structural components, remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. This benchmark emphasizes fine-grained understanding and interpretability, challenging models to answer queries such as “What?”, “Where?”, and “Which ones?” across various cognitive levels. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Specifically, we introduce the largest molecular grounding benchmark to date, consisting of 187k QA pairs across five tasks, each targeti ng a distinct cognitive level. Extensive evaluations of both general-purpose and domain-specific (M)LLMs highlight the challenges posed by this benchmark. While existing techniques, such as in-context learning, fine-tuning, and multi-agent strategies, can improve performance, significant progress is still needed to enhance referential capabilities. Furthermore, we demonstrate that molecular grounding can also benefit traditional tasks such as molecular captioning and Anatomical, Therapeutic, Chemical (ATC) classification. The source code and data are available at https://anonymous.4open.science/r/MolGround-2025/.
Primary Area: datasets and benchmarks
Submission Number: 5621
Loading