CaLMol: Disentangled Causal Graph LLM for Molecular Relational Learning

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Molecular Relational Learning, Large language Model, Graph Neural Network, Causal Learning
Abstract: Molecular Relational Learning (MRL), focused on understanding interactions between molecular pairs, is essential for drug design by utilizing both structural properties and textual knowledge, such as expert documents. However, most existing MRL methods assume static molecular distributions, meaning the distributions remain consistent across training and testing stages. This assumption may lead to the exploitation of variant correlations between structures and texts regarding interactions, thereby failing in the ubiquitous scenarios involving new drug predictions. To bridge this gap, we investigate zero-shot MRL by leveraging invariant relationships between molecular texts and structures w.r.t interactions for new molecules, which is largely unexplored in the literature and is highly non-trivial with following challenges: 1) How to disentangle molecular structure components between each pair to intrinsically determine interactions and address potential structural distribution shift issues for new drugs? 2) How to align molecular structures with semantic textual information to achieve invariant molecular relation predictions for new drugs? To tackle these challenges, we propose a novel Causally Disentangled Invariant Graph Large Language Model (LLM) for Molecular Relational Learning (CaLMol), capable of exploiting invariant molecular relationships to predict interactions for new drugs. Specifically, we propose Causal Molecule Substructure Disentanglement to capture the invariant well-recognized substructure pair for a specific molecule interaction. Then, we propose Molecule Structure and Property aware LLM Alignment to use molecule (with invariant substructure)-textual property pair to align structure information to semantic information, and use them together to guide the interaction prediction. On this basis, LLM can also provide further explanations. Extensive experiments on qualitative and quantitative tasks including 7 datasets demonstrate that our proposed CaLMol achieves advanced performance on predicting molecule interactions involving new molecules.
Supplementary Material: pdf
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9855
Loading