Keywords: Molecular Representation Learning, Graph Neural Network
Abstract: Molecular representation learning using graph neural networks(GNNs) has become a research hotspot in the fields of chemistry and biology in recent years. The pretraining-finetuning paradigm has been widely used to address the issue of limited labeled molecular datasets, achieving great success due to its ability to leverage large amounts of unlabeled data. Additionally, frequently occurring molecular substructures, known as motifs, can often capture the local information and higher-order connectivity of molecules more effectively, providing a better paradigm for pretraining. However, existing motif extraction methods face the issues of relying on domain-specific knowledge and neglecting the local structural information of atoms. To address these problems, we propose a motif self-extraction method based on a graph autoencoder. This method utilizes the graph autoencoder for structural reconstruction, allowing the model to automatically identify frequently occurring local patterns. Furthermore, we also propose a motif-based pretraining method that simultaneously captures the local information and higher-order connections of both the molecular graph and the motif graph. We pretrain on the 250K Zinc15 dataset and conduct downstream performance prediction on eight commonly used molecular property prediction datasets. Experimental results demonstrate the effectiveness of our method.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 16053
Loading