Block-Level Recursion: Adaptive Test-Time Routing in Large Language Models

Published: 01 Jun 2026, Last Modified: 01 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: llms, test-time routing, adaptive computation, inference-time scaling
TL;DR: Reuse layer blocks in frozen LLMs to improve the accuracy–FLOPs tradeoff without changing weights or generating extra tokens.
Abstract: Test-time routing improves frozen large language models (LLMs) by taking non-linear paths through their layers, without modifying weights or generating extra tokens. Existing approaches define route spaces that grow exponentially with depth, making them costly to search and hard to learn from. We therefore introduce $\textbf{Block-Level Recursion}$ ($\texttt{BLR}$), a restricted route family that repeats a single contiguous block of transformer layers once. This reduces the number of routes from exponential to quadratic in the number of layers, making exhaustive per-instance evaluation tractable and the oracle upper bound directly measurable. Despite this restriction, $\texttt{BLR}$ retains most of the routing potential. Across six model families and ten reasoning benchmarks, the optimal block varies across models, tasks, and individual inputs, with per-instance oracle gains of $+59.1\%$ on average and up to $+75.8\%$ on individual tasks. $\texttt{BLR}$ also supports two practical policies: a single train-selected block ($\texttt{sBLR}$) that requires no router or per-input overhead, and a learned global router ($\texttt{aBLR}$) trained from dense per-instance rewards over all routes. $\texttt{sBLR}$ already recovers a substantial fraction of the available gains, while $\texttt{aBLR}$ improves further by selecting routes per input. With a frozen Qwen2.5-0.5B backbone, $\texttt{aBLR}$ achieves higher accuracy than the unrouted Qwen2.5-7B model at lower FLOPs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 156
Loading