AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

ICLR 2026 Conference Submission20929 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Optimization, Operations Research, Library learning

TL;DR: We propose a self-improving experience library that turns failed attempts on optimization tasks into structured, reusable insights to automate optimization formulation, outperforming state-of-the-art baselines across benchmarks.

Abstract: Optimization modeling enables critical decisions across industries but remains hard to automate: informal language must be mapped to precise mathematical formulations and executable solver code, while prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present \textbf{AlphaOPT}, a self-improving \emph{experience library} that enables an LLM to learn from limited demonstrations (i.e, even answers along without gold-standard program) and solver feedback without annotated reasoning traces or parameter updates. AlphaOPT operates a continual two-phase cycle: (i) a \emph{Library Learning} phase that reflects on failed attempts, extracts solver-verified, structured insights as $\{\textit{taxonomy},\ \textit{condition},\ \textit{explanation},\ \textit{example}\}$; and (ii) a \emph{Library Evolution} phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65\% $\rightarrow$ 72\% from 100 to 300 training items) and surpasses the strongest baseline by 7.7\% on the out-of-distribution OptiBench dataset when trained only on answers.

Primary Area: optimization

Submission Number: 20929

Loading