Distilling Pre-trained Knowledge in Chemical Reactions for Molecular Property PredictionDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Molecular property prediction, Chemical reactions, Pre-training for molecular representations, Knowledge distillation, AI for drug discovery
Abstract: How to effectively represent molecules is a long-standing challenge for molecular property prediction and drug discovery. Recently, accumulative unlabelled molecule data have spurred the rapid development of pre-training methods for molecular representation learning. However, these works mainly focus on devising self-supervised learning tasks and/or introducing 3D geometric information based on molecular structures with little chemical domain knowledge involved. To address this issue, we propose a novel method (MolKD) by Distilling pre-trained Knowledge in chemical reactions to assist Molecular property prediction. Specifically, MolKD first learns effective representations by incorporating reaction yields to measure transformation efficiency of the reactant-product pair when pre-training on reactions. Next, MolKD introduces the reaction-to-molecule distillation to transfer cross-modal knowledge between pre-training chemical reaction data and the downstream molecular property prediction tasks. Extensive experiments show that our method can learn effective molecular representations, achieving superior performance compared with state-of-the-art baselines, e.g., 2.8% absolute Hit@1 gain on USPTO in chemical reaction prediction and 1.6% absolute AUC-ROC gain on Tox21 with 1/3 pre-training data size in molecular property prediction. Further investigations on pre-trained molecular representations indicate that MolKD learns to distinguish chemically meaningful molecular similarities, which enables molecular property prediction with high robustness and interpretability.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Machine Learning for Sciences (eg biology, physics, health sciences, social sciences, climate/sustainability )
TL;DR: We propose a novel method to incorporate chemical domain knowledge for molecular property prediction.
5 Replies

Loading