PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Reasoning Models
Keywords: Reasoning, Efficient Inference Methods, Problem Solving
TL;DR: We propose PREMISE, a framework that optimizes large language model reasoning to be shorter and cheaper without sacrificing accuracy.
Abstract: Large Reasoning Models (LRMs) like Claude 3.7 Sonnet and OpenAI o1 achieve strong performance on mathematical tasks via long Chain-of-Thought (CoT), but often generate unnecessarily verbose reasoning traces. This inflates token usage and cost, limiting deployment in latency-sensitive or API-constrained settings. To address this issue, we present \textbf{PREMISE} (\textit{PRompt-based Efficient Mathematical Inference with Strategic Evaluation}), an optimization framework designed specifically for black-box commercial LRMs. PREMISE reduces reasoning overhead without modifying model weights or requiring multiple queries. It combines trace-level diagnostics with gradient-based prompt optimization to minimize redundant computation while preserving answer accuracy. Across GSM8K, SVAMP, and Math500, PREMISE matches or exceeds baseline accuracy, while reducing reasoning tokens by up to \textbf{87.5\%} and cutting dollar cost by \textbf{69--82\%}.
Submission Number: 127
Loading