Language Models Enable Data-Augmented Inorganic Materials Synthesis Planning

Thorben Prein; Elton Pan; Janik Jehkul; Steffen Weinmann; Elsa Olivetti; Jennifer L.M. Rupp

Language Models Enable Data-Augmented Inorganic Materials Synthesis Planning

Thorben Prein, Elton Pan, Janik Jehkul, Steffen Weinmann, Elsa Olivetti, Jennifer L.M. Rupp

Published: 20 Sept 2025, Last Modified: 05 Nov 2025AI4Mat-NeurIPS-2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, inorganic synthesis, synthesis condition prediction, precursor recommendation

TL;DR: We propose a hybrid workflow combining LLMs and task-specific models to enable scalable, data-efficient inorganic synthesis planning.

Abstract: Inorganic synthesis planning has largely relied on heuristic strategies or machine-learning models trained on limited datasets, which restricts generality. We show that general-purpose language models, without task-specific fine-tuning, can recall synthesis conditions reported in the scientific literature. Off-the-shelf models, including \textit{GPT-4.1}, \textit{Gemini 2.0 Flash}, and \textit{Llama 4 Maverick}, reach Top-1 precursor-prediction accuracy of up to 53.8\% and Top-5 performance of 66.8\% on a held-out set of 1{,}000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126\,\textdegree{}C, matching or surpassing specialized regression baselines. Ensembling these language models further improves predictive accuracy and cuts inference cost per prediction by up to 70\%. Leveraging the broad, cross-domain knowledge of language models, we assess knowledge transfer by training a transformer, \texttt{SyntMTE}, on 28{,}548 LM-generated reaction recipes. Relative to a model trained on literature-reported data, a model trained solely on LM-generated data attains competitive performance (only 6\% lower). Moreover, training on both LM-generated and literature-reported data yields up to a 4\% improvement. In a case study on \ce{Li7La3Zr2O12} solid-state electrolytes, \texttt{SyntMTE} reproduces experimentally observed dopant-dependent sintering trends. Together, these results establish a hybrid workflow for scalable, data-efficient inorganic synthesis planning. This non-archival workshop paper summarizes work currently under review at ACS Applied Materials \& Interfaces; portions of the text and figures are adapted from that manuscript.

Submission Track: Paper Track (Full Paper)

Submission Category: Automated Synthesis

Supplementary Material: pdf

Institution Location: {Munich, Germany}, {Cambridge, United States}

Submission Number: 24

Loading