Latent Diffusion Pretraining for Crystal Property Prediction

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Crystal Property Prediction; Latent Diffusion Based Pretraining; Graph Neural Network; Materials Science
TL;DR: We propose CrysLDNet that leverages latent-diffusion pretraining to accurately predict crystal properties with limited data.
Abstract: Fast and accurate prediction of crystal properties is a central challenge in new materials design. Graph Neural Networks have emerged as powerful tools for this task due to their ability to encode the local structural environment of atoms within a crystal. However, these models are data hungry, and in practice, labeled data for crystal properties are very scarce. Pretrain–finetuning strategies, particularly those based on diffusion models, have shown promise in addressing these limitations. In this work, we introduce a novel latent-diffusion based pretraining framework designed to mitigate the data scarcity issue. Our approach integrates a Variational Autoencoder (VAE) with a diffusion model during the pretraining stage. The VAE encoder maps 3D crystal structures into a smooth latent space, within which the diffusion process is applied. This latent diffusion pretraining enables the graph encoder to effectively capture structural and chemical semantics from large-scale unlabeled data, which can then be finetuned for specific property prediction tasks. Comprehensive experiments on popular DFT datasets for property prediction reveal that CrysLDNet significantly outperforms both training-from-scratch and pretrained baselines, with average improvements of 6.93% and 7.83% on JARVIS and MP over the second-best baseline. Additionally, the learned representations remain robust under sparse data conditions and are expressive enough to correct DFT errors when finetuned with limited experimental data.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 17748
Loading