Enhancing Molecular Conformer Generation via Fragment- Augmented Diffusion Pretraining

TMLR Paper4089 Authors

30 Jan 2025 (modified: 17 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advances in diffusion-based methods have shown promising results for molecular conformer generation, yet their performance remains constrained by training data scarcity---particularly for structurally complex molecules. In this work, we present Fragment-Augmented Diffusion (FragDiff), a data-centric augmentation strategy that incorporates chemical fragmentation techniques into the pre-training phase of modern diffusion-based generative models. Our key innovation lies in decomposing molecules into chemically meaningful fragments that serve as building blocks for systematic data augmentation, enabling the diffusion model to learn enhanced local geometry while maintaining global molecular topology. Unlike existing approaches that focus on complex architectural modifications, FragDiff adopts a data-centric paradigm orthogonal to model design. Comprehensive benchmarks show FragDiff's superior performance, especially in data-scarce scenarios. Notably, it achieves 12.2--13.4% performance improvement on molecules 3$\times$ beyond training scale through pretraining on fragments. Overall, we establish a new paradigm integrating chemical fragmentations with diffusion models, advancing computational chemistry workflows. The code is available at https://anonymous.4open.science/r/FragDiff-BA54/.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: Yingce Xia
Submission Number: 4089
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview