\section{Introduction}
\label{sec:introduction}

The oxygen evolution reaction (OER) bottlenecks water splitting with sluggish four-electron kinetics, limiting clean hydrogen production \cite{friedlingstein2024global}. While IrO$_2$/RuO$_2$ achieve 320-370mV overpotentials, their scarcity motivates high-entropy alloy (HEA) exploration \cite{he2023threedfourfivehea}. However, the $10^{60}$ possible five-component combinations and 10-20 year discovery cycles demand new approaches beyond traditional high-throughput screening \cite{ulissi2017machine}.

We demonstrate that large language models (LLMs), despite lacking chemistry-specific training, can discover high-performance catalysts when grounded through retrieval-augmented generation (RAG). GPT-4's implicit chemical knowledge from training corpora \cite{microsoft2023impact,bran2024chemcrow}, combined with RAG access to 50,000+ validated materials \cite{lewis2020retrieval}, enables directed exploration without fine-tuning. Unlike graph neural networks requiring 10$^6$+ training samples \cite{schnet2017,mai2023graph}, our approach leverages pre-existing knowledge with structured prompts encoding design rules.

\textbf{Key contributions:} (1) First LLM-driven catalyst discovery without fine-tuning—250+ HEAs with 82\% stability rate; (2) 200× computational efficiency via RAG integration, matching GNN performance (mean $\eta$=0.352V) with zero training data; (3) Best catalyst Fe$_{0.2}$Co$_{0.2}$Ni$_{0.2}$Ir$_{0.1}$Ru$_{0.3}$ achieves 0.285V overpotential, 25\% better than IrO$_2$; (4) Experimental validation of 10 candidates confirms DFT accuracy (Spearman $\rho$=0.89); (5) Democratized discovery through natural language interface, enabling non-specialists to design materials.

