Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Optimization, Optimization Problem Formulation, Problem Definition, Foundation Model
TL;DR: We propoed the first application of RLVR to directly enhance LLMs' proficiency in optimization modeling: SI-RL, achieves state-of-the-art on diverse public benchmarks
Abstract: Optimization modeling is fundamental to decision-making in fields such as supply chain management, logistics, and financial engineering, but its complexity presents a major barrier to adoption. Automating model creation from natural language is key to improving efficiency and access. However, while Large Language Models (LLMs) are a promising tool for this, they often produce flawed or infeasible results due to errors and hallucinations. To address this issue, we propose Solver-Informed Reinforcement Learning (SIRL), a framework that uses Reinforcement Learning with Verifiable Reward to improve LLMs’ ability to generate accurate and executable optimization models. Specifically, SIRL automatically assesses the executable code and the instance-level mathematical model represented by the associated .lp files. This process yields precise feedback on syntactic validity, feasibility, and solution quality, which serves as a direct reward signal to guide the reinforcement learning process. Furthermore, this verification mechanism also supports our instance-enhanced self-consistency method for creating high-quality training data. Extensive experiments on diverse public benchmarks demonstrate that models trained with our SIRL framework achieve state-of-the-art performance, substantially outperforming existing methods in generating accurate and executable optimization models. Specifically, our SIRL-32B model surpasses DeepSeek-V3 and OpenAI-o3 on the majority of these benchmarks. Our code is publicly available at https://github.com/Cardinal-Operations/SIRL.
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 15718
Loading