Holistic Molecular Representation Learning via Multi-view Fragmentation

Published: 07 Jun 2024, Last Modified: 07 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Learning chemically meaningful representations from unlabeled molecules plays a vital role in AI-based drug design and discovery. In response to this, several self-supervised learning methods have been developed, focusing either on global (e.g., graph-level) or local (e.g., motif-level) information of molecular graphs. However, it is still unclear which approach is more effective for learning better molecular representations. In this paper, we propose a novel holistic self-supervised molecular representation learning framework that effectively learns both global and local molecular information. Our key idea is to utilize fragmentation, which decomposes a molecule into a set of chemically meaningful fragments (e.g., functional groups), to associate a global graph structure to a set of local substructures, thereby preserving chemical properties and learn both information via contrastive learning between them. Additionally, we also consider the 3D geometry of molecules as another view for contrastive learning. We demonstrate that our framework outperforms prior molecular representation learning methods across various molecular property prediction tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have included the required changes from the rebuttal and the meta-review.
Code: https://github.com/Seojin-Kim/HoliMol
Supplementary Material: zip
Assigned Action Editor: ~Ying_Wei1
Submission Number: 2012