Holistic Molecular Representation Learning via Multi-view Fragmentation

TMLR Paper2012 Authors

04 Jan 2024 (modified: 23 Apr 2024)Decision pending for TMLREveryoneRevisionsBibTeX
Abstract: Learning chemically meaningful representations from unlabeled molecules plays a vital role in AI-based drug design and discovery. In response to this, several self-supervised learning methods have been developed, focusing either on global (e.g., graph-level) or local (e.g., motif-level) information of molecular graphs. However, it is still unclear which approach is more effective for learning better molecular representations. In this paper, we propose a novel holistic self-supervised molecular representation learning framework that effectively learns both global and local molecular information. Our key idea is to utilize fragmentation, which decomposes a molecule into a set of chemically meaningful fragments (e.g., functional groups), to associate a global graph structure to a set of local substructures, thereby preserving chemical properties and learn both information via contrastive learning between them. Additionally, we also consider the 3D geometry of molecules as another view for contrastive learning. We demonstrate that our framework outperforms prior molecular representation learning methods across various molecular property prediction tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ying_Wei1
Submission Number: 2012
Loading