PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models

ICLR 2026 Conference Submission15296 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reasoning, Efficient Inference Methods, Problem Solving
TL;DR: We propose PREMISE, a framework that optimizes large language model reasoning to be shorter and cheaper without sacrificing accuracy.
Abstract: Large Reasoning Models (LRMs) like Claude 3.7 Sonnet and OpenAI o1 achieve strong performance on mathematical tasks via long Chain-of-Thought (CoT), but often generate unnecessarily verbose reasoning traces. This inflates token usage and cost, limiting deployment in latency-sensitive or API-constrained settings. We present \textbf{PREMISE} (\textit{PRompt-based Efficient Mathematical Inference with Strategic Evaluation}), a prompt-only framework designed specifically for black-box commercial LRMs. PREMISE reduces reasoning overhead without modifying model weights or requiring multiple queries. It combines trace-level diagnostics with gradient-based prompt optimization to minimize redundant computation while preserving answer accuracy. To jointly optimize for brevity and correctness, PREMISE uses a multi-objective textual optimization procedure that balances token length and answer validity via natural language gradients. Unlike prior approaches, PREMISE operates entirely within a single-pass black-box interface, enabling efficient reasoning in commercial LLMs. Across GSM8K, SVAMP, and MATH500, PREMISE is able to obtain average accuracy of 94.7\%, while reducing reasoning tokens by up to \textbf{84.3\%} and cutting dollar cost by \textbf{82.2\%}. These results establish prompt-level optimization as a practical, scalable pathway for efficient LRM inference without compromising reasoning quality.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15296
Loading