Hawkeye: Model Collaboration for Efficient Reasoning

Jianshu She; Zhuohao Li; Zhemin Huang; Qi Li; Peiran Xu; Haonan Li; Qirong Ho

Hawkeye: Model Collaboration for Efficient Reasoning

Jianshu She, Zhuohao Li, Zhemin Huang, Qi Li, Peiran Xu, Haonan Li, Qirong Ho

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning (with human feedback), fine-tuning, compression, decoding algorithms, reasoning algorithms

TL;DR: We provide an efficient inference pipeline that optimizes Chain-of-Thought (CoT) reasoning by instructing a Large Language Model (LLM) to generate concise yet effective CoTs for a Small Language Model (SLM) to decode through reinforcement learning.

Abstract: Chain-of-Thought (CoT) reasoning has demonstrated remarkable effectiveness in enhancing the reasoning abilities of large language models (LLMs). However, its efficiency remains a challenge due to excessive intermediate reasoning tokens, which introduce both semantic redundancy and unnecessarily detailed reasoning steps. Moreover, the computational expense and latency remain high, as the cost is determined by the number of output tokens, which encompasses these intermediate steps. In this work, we observe that most CoT tokens are unnecessary, and retaining only a small portion of them is sufficient for high-quality responses. Inspired by this, we propose Hawkeye, a novel post-training and inference framework where a large model produce concise CoT instructions to guide a smaller model in response generation. Hawkeye quantifies redundancy in CoT reasoning and distills high-density information via reinforcement learning. By leveraging these concise CoTs, Hawkeye is able to expand responses while reducing token usage and computational cost significantly. Our evaluation results show that Hawkeye can achieve comparable response quality using only 35\% of the complete CoTs while improving clarity, coherence, and conciseness by approximately 10\%. Furthermore, Hawkeye can accelerate end-to-end reasoning by up to 3.4× on complex math tasks while saving up tp 60\% inference cost. Hawkeye will be open-sourced and the models will be available soon.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 363

Loading