Abstract: Molecular representation learning is central to molecular property prediction, which is a vital component in drug discovery. Existing methods, which mainly focus on the atom-level molecular graphs, often find it challenging to directly model the relation between fragment (substructure) and function of molecules, largely due to insufficient fragment priors. In this work, we propose a molecular self-supervised learning framework \textbf{FragFormer}, which aims to learn the representation of fragments and their contextual relationships. Given the prior that an atom can be part of multiple functional groups, we develop $k$-\textbf{D}egree \textbf{Ove}rlapping fragmentation (\textbf{DOVE}), which generates overlapping fragment graph by employing the iterative line graph. Besides, DOVE can preserve the connection information during the fragmentation phase compared to non-overlapping fragmentation. In the pre-training stage, we design a \textit{nested masked fragment prediction} objective, to capture the hierarchical nature of fragments, namely that larger fragments can encompass multiple smaller ones. Based on FragFormer, we introduce a simple yet efficient \textit{fragment-level} interpretation method \textbf{FragCAM} for the molecular property prediction results with greater accuracy. Moreover, thanks to the fragment modeling, our model is more capable of processing large molecule, such as peptides, and capturing the long-range interactions inside molecules. Our approach achieves state-of-the-art (SOTA) performance on eight out of eleven molecular property prediction datasets on PharmaBench. On long-range biological benchmark with peptide data, FragFormer can beat strong baselines by a clear margin, which shows the model's potential to generalize to larger molecules. Finally, we demonstrate that our model can effectively identify decisive fragments for prediction results on a real-world dataset\footnote{Our code is available at \url{https://github.com/wjxts/FragFormer/}}.
Submission Length: Long submission (more than 12 pages of main content)
Code: https://github.com/wjxts/FragFormer
Supplementary Material: zip
Assigned Action Editor: ~Hankook_Lee1
Submission Number: 3785
Loading