Benchmarking LLMs for atomic-level geometric manipulation in crystals

Published: 20 Sept 2025, Last Modified: 29 Oct 2025AI4Mat-NeurIPS-2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Material Science
Abstract: Recent advancements with video generators, language aligned robotics models and tool-augmented design frameworks suggest that large language models (LLMs) may soon no longer struggle with 3D spatial reasoning. To bring these developments into the material sciences, we present AtomWorld, a data generator and benchmark that evaluates LLMs on atomic-level operations (e.g. insert, move, rotate atoms) in CIF files. This benchmark was tested across major chat models, finding these models to generally take an algorithmic approach - which yielded successful completion of simple tasks such as adding and moving atoms, but struggled with more complex tasks such as rotating around an atom. LLM inaptitude with spatial reasoning limits their usefulness in crystallography - addressing this problem is a necessary first step towards enabling higher level tasks such as seeing motifs, symmetries, repairing or validating complex structures, and proposing novel structures.
Submission Track: Paper Track (Short Paper)
Submission Category: AI-Guided Design
Supplementary Material: pdf
Institution Location: {Sydney, Australia},{Hefei China)
AI4Mat Journal Track: Yes
AI4Mat RLSF: Yes
Submission Number: 51
Loading