Keywords: Diffusion models, molecular generation, generative models
Abstract: Diffusion models have achieved state-of-the-art performance across diverse domains, yet their application to molecular generation remains challenging. Unlike many data types where values can tolerate slight variations, such as pixel intensities in images, molecules are governed by strict geometric and chemical constraints: minor variations in the atomic coordinates of even a single atom can lead to totally invalid or unstable molecules.
These constraints give rise to *highly concentrated* data distributions, forming sharp probability peaks. Moreover, these peaks are *densely packed* in configuration space: changing one atom's type, along with small but precise adjustments to its position and that of its neighbors, can result in a distinct molecule, whereas images generally require much larger perturbations to change semantic meaning.
This dense-concentrated structure makes diffusion modeling fragile: because valid regions are narrow and tightly clustered, even small deviations at intermediate timesteps can easily cross validity boundaries. Once entering the invalid regions, the generative process provides unreliable guidance, causing errors that accumulate over timesteps and drift generative trajectories off-distribution, ultimately leading to irreparable structural violations.
To address this challenge, we formalize the notion of dense-concentrated structure in molecular distributions
and analyze how discrepancies at intermediate steps propagate under reverse inference.
Building on this insight, we propose **DIST**, a plug-in corrective method that **DI**ffuses and **ST**eers the intermediate distribution, thereby realigning inference trajectories toward a valid molecular distribution. Our method is model-agnostic and can be integrated into a wide range of existing diffusion models, achieving significant improvements in performance while reducing the computational cost to nearly half the standard number of timesteps.
Primary Area: generative models
Submission Number: 10075
Loading