MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

ICLR 2026 Conference Submission17247 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optimization Generalization, Reasoning to Model and Solve, Limited Resources, Large Language Models for Optimization

Abstract: Modeling and solving optimization problems via large language models (LLMs) has attracted increasing attention recently. Although both prompt-based and learning-based methods have achieved progress, they remain limited by their reliance on large data volumes, high-quality annotations, expensive intermediate step verification, and huge computational overhead. From a data privacy perspective, a low-cost localized deployment of small-scale LLMs is of significant value. To train a small-scale LLM with excellent optimization generalization under limited resources, this paper proposes a reasoning to model and solve paradigm called MiniOpt based on reinforcement learning (RL) with verifiable reward. To reduce the demand for training data, MiniOpt adopts two-stage RL training. In the first stage the model quickly learns the model-and-solve paradigm and in the second stage it acquires strong optimization generalization ability. To reduce the cost of verifying the response of LLMs, OptReward in MiniOpt verifies the completeness of problem modeling and avoids the need for content validation. The above techniques enable the training of small-scale LLMs with strong optimization generalization ability under limited resources, thereby resulting in low inference cost for localized deployment and usage. Extensive experiments show that MiniOpt-3B exhibits strong optimization generalization across various optimization types and scenarios. For models with parameters fewer than 10B, MiniOpt-3B achieves the highest average solving accuracy (SA). For models with more than 10B parameters, MiniOpt-3B still shows competitive performance. Notably, MiniOpt-3B indicates superior SA on the hard OptMATH-Bench while only consuming 37.64% of the average output tokens required by DeepSeek-R1. The code is available at https://anonymous.4open.science/r/MiniOpt-6194.

Primary Area: optimization

Submission Number: 17247

Loading