Keywords: Crystal sructure prediction, Diffusion models for computational chemistry, AI for science
TL;DR: We present an ab-initio all-atom diffusion model for molecular crystal structure prediction and outperform existing machine learning methods by an order of magnitude.
Abstract: Accurately predicting experimentally-realizable $3\textrm{D}$ molecular crystal structures from their $2\textrm{D}$ chemical graphs is a long-standing open challenge in computational chemistry called $\textit{crystal structure prediction}$ (CSP). Efficiently solving this problem has implications ranging from pharmaceuticals to organic semiconductors, as crystal packing directly governs the physical and chemical properties of organic solids. In this paper, we introduce $\textrm{OXtal}$, a large-scale $100\textrm{M}$ parameter all-atom diffusion model that directly learns the conditional joint distribution over intramolecular conformations and periodic packing. To efficiently scale $\textrm{OXtal}$, we abandon explicit equivariant architectures imposing inductive bias arising from crystal symmetries in favor of data augmentation strategies. We further propose a novel crystallization-inspired lattice-free training scheme, $\textit{Stoichiometric Stochastic Shell Sampling}$ ($S^4$), that efficiently captures long-range interactions while sidestepping explicit lattice parametrization---thus enabling more scalable architectural choices at all-atom resolution. Trained on $600 \text{K}$ experimentally validated crystal structures (including rigid and flexible molecules, co-crystals, and solvates), $\textrm{OXtal}$ achieves orders-of-magnitude improvements over prior $\textit{ab-initio}$ ML CSP methods, which remaining orders of magnitude cheaper than traditional quantum-chemical approaches. Specifically, $\textrm{OXtal}$
reproduces experimental structures with conformer $\mathrm{RMSD}_1<0.5$ Å and attains
over 80\% lattice-match success, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 17588
Loading