ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling

ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling

TMLR Paper8613 Authors

25 Apr 2026 (modified: 18 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Formulating optimization problems demands substantial manual effort and specialized domain expertise. While Large Language Models (LLMs) have shown promise for automating this process, evaluating the correctness of their outputs remains challenging due to the lack of reliable evaluation metrics. Existing solver-based evaluation methods lack rigorous correctness guarantees, become uninformative when models are infeasible, and incur prohibitive computational costs on hard instances. To address these limitations, we propose ORGEval, a graph-theoretic evaluation framework for assessing LLMs’ capabilities in formulating linear and mixed-integer linear programs (MILPs). ORGEval represents optimization instances as bipartite graphs, thereby reducing equivalence detection to graph isomorphism (GI) testing. The Weisfeiler-Lehman (WL) test is a classical heuristic for GI, but it is known to yield false positives on certain graph structures. We identify a sufficient condition, called symmetric decomposability (SD), under which the WL test is guaranteed to correctly determine isomorphism. Building on this result, ORGEval combines the WL-test for bipartite graphs with an efficient SD verification procedure to provide provably correct equivalence evaluation. We further introduce Bench4Opt, a benchmark dataset that separates models from data, to validate ORGEval and benchmark state-of-the-art LLMs on optimization modeling. Experimental results demonstrate that ORGEval reliably detects equivalence while significantly outperforming solver-based methods in runtime, particularly on computationally challenging instances. Our benchmark reveals that optimization modeling remains a challenging task for all tested LLMs, with the best-performing models achieving only 54.82\% accuracy.

Submission Type: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=u1WoNw4rhZ&referrer=%5Bthe%20profile%20of%20Ziwei%20Zhu%5D(%2Fprofile%3Fid%3D~Ziwei_Zhu2)

Changes Since Last Submission: Revised the format.

Assigned Action Editor: ~Mingrui_Liu2

Submission Number: 8613

Loading