Steering 3D Molecule Generation in Data-Sparse Regions via Distributional Physical Priors

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Molecule Generation, Diffusion Model, Out of distribution
TL;DR: This work presents a novel and principled diffusion-based approach that allows steering a generative model to desired new distributions.
Abstract: Can we train a 3D molecule generator using data from dense regions to generate samples in sparse regions? This challenge can be framed as an out-of-distribution (OOD) generation problem. Existing works on OOD generation primarily focus on property shifts. However, the distribution shifts may come from structural variations in molecules, such as certain types of scaffolds, dubbed as physical priors. This work introduces a novel and principled diffusion-based generative framework, termed _GODD_, which enables training a generator on data-abundant distributions to generalize to data-scarce distributions under structure shifts. Specifically, we propose utilizing a designated equivariant asymmetric autoencoder to capture distributional physical priors. The asymmetric module allows generalization to unseen, out-of-distribution structural variations. As these captured physical priors represent distinct distributions, they can steer the generation of samples that are not in dense regions. We demonstrate that with these encoded structural-grained distributional physical priors, _GODD_ does not need to train with any molecules from the sparse regions. We conduct extensive experiments across various out-of-distribution molecule generation tasks using benchmark datasets. Compared to alternative baselines, our approach shows a significant improvement of up to 65.6\% in success rate, defined based on molecular validity, uniqueness, and novelty. Additionally, we show that our generative framework, steered by physical priors, can be readily adapted to canonical fragment-based drug design tasks, exhibiting promising performance.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3666
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview