Benchmarking Text Representations for Crystal Structure Generation with Large Language Models

Shuyi Jia; Aamod Varma; Pranav Manivannan; Dhruva Chayapathy; Victor Fung

Benchmarking Text Representations for Crystal Structure Generation with Large Language Models

Shuyi Jia, Aamod Varma, Pranav Manivannan, Dhruva Chayapathy, Victor Fung

Published: 09 Apr 2025, Last Modified: 09 Apr 2025AI4MAT-ICLR-2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Submission Track: Full Paper

Submission Category: AI-Guided Design

Keywords: materials discovery, generative models, large language models

TL;DR: We benchmark the performance of several textual representations of structures with different levels of invariances and invertibility for crystal structure generation via fine-tuning of LLMs.

Abstract: The discovery of novel materials is essential for scientific and technological advancements but remains a significant challenge due to the vastness of the chemical space. Large language models (LLMs) have shown particular promise as generative models for materials discovery, where novel materials are generated in the form of textual representations of their crystal structures. In this work, we benchmark the performance of several textual representations with different levels of invariances and invertibility for crystal structure generation, covering Cartesian, Z-matrix, distance matrix, and SLICES representations. We find that all representations can be effectively leveraged by LLMs for structure generation. However, we observe that the inclusion of translation and rotation invariances in more complex representations does not necessarily yield better generation performance, contrary to expectations. These findings suggest that established design principles for conventional structure representations do not apply for LLMs. This study establishes the first benchmark for textual representations in crystal structure generation using fine-tuned LLMs, offering a foundation for accelerating materials discovery with language models.

AI4Mat Journal Track: Yes

Submission Number: 35

Loading