MOG: Molecular Out-of-distribution Generation with Energy-based ModelsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Drug Discovery, Molecule Generation, Energy-based Models
Abstract: Recent advances of deep generative models opened up a new horizon for de novo drug discovery. However, a well-known problem of existing works on molecule generation is that the generated molecules highly resemble those in the training set. Models that do not require training molecules such as RL-based models circumvent this problem, but they lack information about existing molecules. In this paper, we propose Molecular Out-of-distribution Generation (MOG), a novel framework that explicitly generates OOD molecules with respect to given molecules by combining two aspects of energy-based models (EBMs): generation and out-of-distribution (OOD) detection. This can be done by introducing multiple energy pivots to Langevin dynamics in generation and increase energy instead of minimizing it. We also utilize a property predictor to provide the property gradient of molecules to the modified Langevin dynamics. To validate the ability to explore the chemical space beyond the known molecular distribution, we experiment with MOG to generate molecules of high absolute values of docking score, which is the affinity score based on a physical binding simulation between a target protein and a given molecule. Docking score is a strong proxy to drug activity unlike penalized logP or QED and requires stronger exploration as it is nonlinear to local molecular structures and has many local optima. MOG is able to generate molecules with high docking scores compared to existing methods. Moreover, we further show the energy-increasing strategy based on EBMs can be universally applied to existing models and enhance their resulting novelty.
One-sentence Summary: We propose a novel framework that focuses on generating out-of-distribution molecules with respect to given molecules with energy-based models.
5 Replies

Loading