BounDr.E: Predicting Drug-likeness through knowledge alignment and EM-like one-class boundary optimization

27 Sept 2024 (modified: 24 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Drug-likeness, one-class boundary, multi-modal alignment, drug discovery
TL;DR: We present a drug-likeness prediction framework of constructing one-class boundary through EM-like optimization with multi-modal alignment of biomedical knowledge and structure.
Abstract: The advent of generative AI models is revolutionizing drug discovery, generating de novo molecules at unprecedented speed. However, accurately identifying and rescuing drug candidates among countless generated molecules remains an open problem. The essence of this drug-likeness prediction task lies in constructing a compact subspace that encompasses majority of approved drugs with only a small number of unknown compounds (drug candidates) inside. Computational challenges arises in constructing a decision boundary on an unbound chemical space that lacks definite negatives, i.e, non drug-likeness. Approved drugs exist highly dispersed across structural space, making it more harsh to effectively separate drugs from non-drugs through existing classifiers. Addressing such challenges, we introduce BounDr.E: a novel approach for learning a compact boundary of drug-likeness through an Expectation-Maximization (EM)-like iterative optimization process. Specifically, we refine both the boundary and the distribution of the embedding space via metric learning, allowing the model to iteratively tighten the drug-like boundary while pushing non-drug-like compounds outside. Augmented by integration of biomedical context within knowledge graphs via multi-modal alignment, our model demonstrates 10% increase in F1 score over the previous state-of-the-art, along with strongest robustness to cross-dataset validation. Zero-shot toxic compound filtering and comprehensive drug discovery pipeline case studies further showcases its utility in large-scale screening of AI-generated compounds. To facilitate in silico drug discovery, we provide the code and benchmark data under various splitting schemes at: https://anonymous.4open.science/r/boundr_e.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9796
Loading