How Far Are We from Optimal Reasoning Efficiency?

Jiaxuan Gao; Shu Yan; Qixin Tan; lu Yang; Shusheng Xu; Wei Fu; Zhiyu Mei; Kaifeng Lyu; Yi Wu

How Far Are We from Optimal Reasoning Efficiency?

Jiaxuan Gao, Shu Yan, Qixin Tan, lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Effiicent Reasoning; Large Reasoning Models; Reinforcement Learning for Reasoning

TL;DR: This paper introduces a nove metric (REG) for evaluating the reasoning efficiency of LRMs and a reinforcement learning method (REO-RL) that significantly reduces reasoning redundancy while maintaining accuracy.

Abstract: Large Reasoning Models (LRMs) demonstrate remarkable problem-solving capabilities through extended Chain-of-Thought (CoT) reasoning but often produce excessively verbose and redundant reasoning traces. This inefficiency incurs high inference costs and limits practical deployment. While existing fine-tuning methods aim to improve reasoning efficiency, assessing their efficiency gains remains challenging due to inconsistent evaluations. In this work, we introduce the ***reasoning efficiency frontiers***, empirical upper bounds derived from fine-tuning a base LRM (DeepSeek-R1-Distill-Qwen-1.5B/7B) across diverse approaches and training configurations. Based on these frontiers, we propose the ***Reasoning Efficiency Gap (REG)***, a unified metric quantifying deviations of any fine-tuned LRMs from these frontiers. Systematic evaluation on challenging mathematical benchmarks, AMC23, AIME24, and AIME25, reveals significant gaps in current methods: they either sacrifice accuracy for short length or use excessive tokens to achieve sub-optimal accuracies despite high overall accuracy. To reduce the efficiency gap, we propose ***REO-RL***, a Reinforcement Learning algorithm that optimizes reasoning efficiency by targeting a sparse set of token budgets. Leveraging numerical integration over strategically selected budgets, REO-RL approximates the full efficiency objective with low error using a small set of token budgets. Experiments show that, compared to vanilla RL with outcome reward, REO-RL reduces the reasoning efficiency gap by 74.5\% and 64.2\% in the 1.5B and 7B settings. The 7B LRM fine-tuned with REO-RL achieves reasoning conciseness surpassing frontier LRMs like Qwen3 and Claude Sonnet 3.7. Ablation studies confirm the efficacy of our token budget strategy and highlight REO-RL’s flexibility across design choices. This work establishes a systematic framework for evaluating and optimizing reasoning efficiency in LRMs. We will release the related code, data, and models to support future research on efficient reasoning in LRMs.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 6962

Loading