Ensemble Guidance: Towards Generative 3D SBDD in Bioactive Chemical Spaces

Published: 17 Jun 2024, Last Modified: 17 Jul 2024ICML2024-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: molecule design, generative modelling, diffusion models
Abstract: Many works use diffusion generative modelling for 3D Structure-based Drug Design. %However, one critical unaddressed issue thus far is the datasets these models are trained on. The data these models are trained on are predominantly sourced from the Protein Data Bank (PDB); these datasets capture a severely constrained and skewed subset of chemical space, heavily biasing generated molecules to be non-drug like whilst significantly narrowing the diversity of the chemical landscapes generative models observe during training. While there is some evidence these methods can generate complimentary molecules, this raises concerns about efficacy in novel hit discovery compared to virtual screening of large molecule libraries. Here, we introduce ensemble guidance, a technique for composing learned distributions from multiple diffusion models to guide SBDD models to generate molecules with more appropriate properties and higher diversity. For example, ensemble guidance reduces the frequency of highly polar phosphate groups from 0.32 per molecule to 0. Finally, we propose many areas of future work and hope that ensemble guidance can be fruitfully applied to a number of other (bio)molecular design tasks in data-limited regimes.
Submission Number: 178
Loading