CLOUD: A Scalable Scientific Foundation Model for Crystal Representation Learning

Published: 11 Oct 2024, Last Modified: 12 Nov 2024Neurips 2024 Workshop FM4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scientific foundation model, crystal property prediction, symmetry-aware string representation
TL;DR: We develop a foundation model for crystals that demonstrates high prediction accuracy, strong generalization capabilities, and robust scaling performance.
Abstract: Developing machine learning models for crystal property predictions has been hampered by the need for labeled data from costly experiments or Density Functional Theory (DFT), resulting in limited data size and poor generalization to new crystals. Foundation models (FMs) present a potential solution with their self-supervised pre-training on unlabeled datasets and scalable model performance. Yet, applying FMs to crystals is challenging due to the inadequacy of existing string representations to capture critical structural information and the absence of scaling analysis for FMs specialized in materials science. Herein, We propose CrystaL fOUnDation model (CLOUD), a Transformer-based foundation model for crystal representation learning and property prediction. CLOUD utilizes a novel symmetry-aware string representation, eliminating the need for atomic coordinates or equivariant models. Pre-trained on million-scale crystal data, CLOUD is then fine-tuned and assessed on various downstream tasks, significantly outperforming other coordinate-free models on MatBench and MatBench Discovery. In addition, CLOUD achieves state-of-the-art (SOTA) or near-SOTA performance on UnconvBench for unconventional crystal property predictions. Furthermore, the pre-trained CLOUD demonstrates robust scaling with data and model size, which suggests CLOUD's potential as a scalable solution for crystal foundation models.
Submission Number: 65
Loading