CrystalICL: Enabling In-Context Learning for Crystal Generation

CrystalICL: Enabling In-Context Learning for Crystal Generation

ACL ARR 2025 February Submission3816 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Designing crystal materials with desired physicochemical properties remains a fundamental challenge in materials science. While large language models (LLMs) have demonstrated strong in-context learning (ICL) capabilities across various domains, existing LLM-based crystal generation approaches are restricted to zero-shot scenarios, failing to leverage ICL for learning from a limited number of examples. This limitation prevents LLMs from fully adapting to complex, data-scarce materials design tasks, where few-shot learning could be particularly beneficial. To bridge this gap, we propose CrystalICL, a novel model designed for few-shot crystal generation. Specifically, we introduce a space-group based crystal tokenization method, which effectively reduces the complexity of modeling crystal symmetry in LLMs. Additionally, we develop a condition-structure aware hybrid instruction tuning framework and a multi-task instruction tuning strategy, enabling the model to better exploit ICL by capturing intricate relationships between crystal structures and their target properties from limited data. Extensive experiments on four crystal generation benchmarks demonstrate the superiority of CrystalICL over the leading baseline methods on conditional and unconditional generation tasks.

Paper Type: Long

Research Area: Generation

Research Area Keywords: few-shot generation

Contribution Types: Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 3816

Loading