Keywords: Large language models, materials synthesis, retrieval-augmented generation, multi-agent systems, test-time compute, inorganic materials
TL;DR: We benchmark RAG, thermodynamic tools, and multi-agent workflows for LLM-based solid-state synthesis prediction, finding that retrieval of similar recipes outperforms both tool augmentation and test-time compute strategies.
Abstract: Identifying synthesis recipes for new inorganic materials remains a major bottleneck in materials discovery. We investigate whether large language models (LLMs) can improve solid-state synthesis prediction through three augmentation strategies: retrieval-augmented generation (RAG) from the literature, the use of domain-specific thermodynamic tools, and multi-step, test-time compute workflows such as debate, self-reflection, and sequential pipelines. When evaluating on 674 literature-derived targets, we find that retrieving relevant synthesis precedents is the most effective strategy, improving top-10 precursor accuracy from 77.0\% to 83.5\%. Thermodynamic tools also improve performance (80.6\%), but provide little additional benefit when RAG is already used (82.9\% on Gemini 3 Flash, 77.5\% on Gemini 2 Flash). By contrast, test-time compute does not improve performance, and sequential multi-agent workflows often reduce accuracy because errors introduced in earlier stages propagate downstream, causing later steps to mis-rank candidates or overwrite correct answers. Our results show that, for solid-state synthesis prediction, providing models with relevant domain information is more effective than increasing test-time compute through multi-agent deliberation.
Submission Track: Paper Track (Tiny Paper)
Submission Category: Automated Synthesis
Submission Number: 52
Loading