Discovering Out-of-Distribution Superconductors via Reinforcement Learning and Model Merging

Po-Yen Tung; David S.D. Gunn; Richard Tomsett; Jonathon Frederick Shiv Markanday; Robert M Forrest; Dr Jonathan Bean

Discovering Out-of-Distribution Superconductors via Reinforcement Learning and Model Merging

Po-Yen Tung, David S.D. Gunn, Richard Tomsett, Jonathon Frederick Shiv Markanday, Robert M Forrest, Dr Jonathan Bean

Published: 02 Mar 2026, Last Modified: 08 Apr 2026AI4Mat-ICLR-2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: superconductors, inverse materials design, reinforcement learning, model merging, diffusion-based generative models, out-of-distribution exploration

Abstract: Inverse design of superconducting materials requires generative models that can optimize functional properties without collapsing to narrow regions of known chemistry. We study this problem using diffusion-based generative models fine-tuned via reinforcement learning (RL) with surrogate feedback, where a graph neural network predictor of superconducting critical temperature provides the reward signal. Compared with classifier-free guidance (CFG), RL increases the average surrogate-predicted critical temperature from 10 K to 17 K. To address diversity collapse during feedback-based optimization, we further merge independently fine-tuned models, which increases the fraction of unique chemical systems by over 7\% relative to single-model fine-tuning. We quantify exploration using a calibrated out-of-distribution metric, OOD@$\alpha$, and find that the merged RL model generates \>92\% of samples beyond the 95th percentile of the calibration distribution (OOD@0.05) vs. 21\% for CFG-based baselines. This increased exploration does not degrade structural validity, with S.U.N. (stable, unique, and novel) rates improving from 0.19 under CFG to 0.57 under RL. Overall, RL combined with model merging provides a practical, high-potential framework for discovering new series of genuinely out-of-distribution superconductors, and other critical materials.

Submission Track: Feedback-Based Learning for Materials Design - Full Paper

Submission Category: AI-Guided Design

Submission Number: 62

Loading