Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model

Published: 01 Jan 2025, Last Modified: 14 May 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Understanding molecules is key to understanding organisms and driving advances in drug discovery, requiring interdisciplinary knowledge across chemistry and biology. Although large molecular language models have achieved notable success in interpreting molecular structures, their instruction datasets are limited to the specific knowledge from task-oriented datasets and do not fully cover the fundamental characteristics of molecules, hindering their abilities as general-purpose molecular assistants. To address this issue, we propose Mol-LLaMA, a large molecular language model that grasps the general knowledge centered on molecules via multi-modal instruction tuning. To this end, we design key data types that encompass the fundamental features of molecules, incorporating essential knowledge from molecular structures. In addition, to improve understanding of molecular features, we introduce a module that integrates complementary information from different molecular encoders, leveraging the distinct advantages of different molecular representations. Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules and generating relevant responses to users' queries with detailed explanations, implying its potential as a general-purpose assistant for molecular analysis. Our project page is at https://mol-llama.github.io/.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview