Geometric Representation Condition Improves Equivariant Molecule Generation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a two-stage, model-agnostic generative approach that effectively leverages molecule representations to improve the generation quality of molecule generative models.
Abstract: Recent advances in molecular generative models have demonstrated great promise for accelerating scientific discovery, particularly in drug design. However, these models often struggle to generate high-quality molecules, especially in conditional scenarios where specific molecular properties must be satisfied. In this work, we introduce GeoRCG, a general framework to improve molecular generative models by integrating geometric representation conditions with provable theoretical guarantees. We decompose the generation process into two stages: first, generating an informative geometric representation; second, generating a molecule conditioned on the representation. Compared with single-stage generation, the easy-to-generate representation in the first stage guides the second stage generation toward a high-quality molecule in a goal-oriented way. Leveraging EDM and SemlaFlow as base generators, we observe significant quality improvements in unconditional molecule generation on the widely used QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional molecular generation task, our framework achieves an average 50\% performance improvement over state-of-the-art approaches, highlighting the superiority of conditioning on semantically rich geometric representations. Furthermore, with such representation guidance, the number of diffusion steps can be reduced to as small as 100 while largely preserving the generation quality achieved with 1,000 steps, thereby significantly reducing the generation iterations needed.
Lay Summary: Designing new drugs with specific properties is a complex and time-consuming task. Recent advancements in artificial intelligence offer hope by designing models to quickly generate potential molecules based on patterns they’ve learned from plenty of data. However, these generative models often struggle to produce high-quality molecules, especially when specific characteristics are required. To address this, we introduced GeoRCG, a new method that enhances generative models' ability to generate better molecules. Instead of creating a molecule in a single step, our approach breaks the process into two stages. First, it generates an informative representation—a numerical vector that captures the molecule’s essential features in a way the computer can easily learn and understand. Then, this representation guides the generation of the actual molecule. This two-step method keeps the AI focused and efficient, resulting in higher-quality molecules through a goal-oriented process. GeoRCG can be integrated into any molecule generative model. In our experiments, it substantially improves well-known models like EDM and SemlaFlow, delivering consistent enhancements in unconditional molecule generation and achieving a 31\% improvement in generating molecules with specific properties. Additionally, it accelerates the generation process by reducing the number of iterations required, making it a valuable tool for advancing drug design and discovery.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: molecule generation, equivariant generative models, representation, geometric deep learning, diffusion models
Submission Number: 4856
Loading