Reasoning Is Not Free: Robust Adaptive Cost-Efficient Router for LLM-as-a-Judge

Wenbo Zhang; Lijinghua Zhang; Liner Xiang; Hengrui Cai

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Router for LLM-as-a-Judge

Wenbo Zhang, Lijinghua Zhang, Liner Xiang, Hengrui Cai

Published: 28 Feb 2026, Last Modified: 04 Apr 2026CAO PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM routing, Accuracy–cost trade-offs, Distribution shift, Linear convergence, Primal–dual optimization

TL;DR: This paper studies when reasoning benefits LLM-as-a-Judge and proposes RACER, a robust routing framework that selectively activates reasoning to improve accuracy–cost trade-offs under distribution shift.

Abstract: Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Router (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly accounts for distribution shift via a KL-divergence uncertainty set, admits an efficient primal–dual algorithm, and enjoys theoretical guarantees including uniqueness of the optimal policy and linear convergence. Extensive experiments show that RACER achieves superior accuracy–cost trade-offs under distribution shift.

Submission Number: 33

Loading