Unsupervised 2D Molecule Drug-likeness Prediction based on Knowledge Distillation

25 Sept 2024 (modified: 16 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Drug-likeness Prediction, Molecule Representation, Molecular Property Prediction
Abstract: With the research significance and application value, drug-likeness prediction aims to accurately screen high-quality drug candidates, and has attracted increasing attention recently. In this regard, dominant studies can be roughly classified into two categories: (1) Supervised drug-likeness prediction based on binary classifiers. To train classifiers, the common practice is to treat real drugs as positive examples and other molecules as negative ones. However, the manual selection of negative samples introduces classification bias into these classifiers. (2) Unsupervised drug-likeness prediction based on SMILES representations, such as an RNN-based language model trained on real drugs. Nevertheless, using SMILES to represent molecules is suboptimal for drug-likeness prediction, which is more relevant to the topological structures of molecules. Besides, the RNN model tends to assign short-SMILES molecules with high scores, regardless of their structures. In this paper, we propose a novel knowledge distillation based unsupervised method, which exploits 2D features of molecules for drug-likeness prediction. The teacher model learns the topology of molecules via two pre-training tasks on a large-scale dataset, and the student model mimic the teacher model on real drugs. In this way, the outputs of these two models will be similar on the drug-like molecules while significantly different on the non-drug-like molecules. To demonstrate the effectiveness of our method, we conduct several groups of experiments on various datasets. Experimental results and in-depth analysis show that our method significantly surpasses all baselines, achieving state-of-the-art performance. Particularly, the prediction bias of SIMILES length is reduced in our method. We will release our code upon the acceptance of our paper.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4585
Loading