Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

ICLR 2026 Conference Submission20860 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Direct preference optimization, preference learning, materials science, alloys, language models

TL;DR: We use preference learning to optimize local LMs to generate candidate compositions for BCC/B2 alloys

Abstract: We apply preference learning to the task of language model generation of novel structural alloys. Where prior work focuses on generating stable inorganic crystals, our approach optimizes for the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Our reward signal is derived from thermodynamic phase calculations, offering a scientifically-grounded feedback for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for targeted properties (in our case, BCC/B2 alloys). The resulting framework is general and adaptable to any design problem for which the design space is enumerable and simulation-based feedback is available.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 20860

Loading