Abstract: Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs.Moreover, the absence of a unified framework exacerbates the issue of insufficient data exploitation, as it hinders the sharing of interaction mechanism learned across various datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for molecular interaction modeling following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. For achieving a unified training paradigm, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information exchange. Moreover, to train this integrated framework efficiently, we introduce a multi-hierarchical CoT theory to refine its training paradigm, and conduct a comprehensive molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Ourexperiments,conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://anonymous.4open.science/r/MolTC-F.
Paper Type: long
Research Area: NLP Applications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English, Molecule Sequences
0 Replies
Loading