ExecTune: Effective Steering of Black-Box LLMs with Guide Models

Published: 02 Mar 2026, Last Modified: 10 Apr 2026LLA 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Black-Box LLMs, LLM Reasoning, Reinforcement Learning
Abstract: For large language models deployed through black-box APIs, recurring inference costs often dominate one-time training costs, motivating composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide–Core Policies (GCoP), in which a guide model generates a structured strategy that is executed by a black-box core model. This abstraction subsumes base, supervised, and advisor-style approaches, which differ primarily in how the guide is trained. We formalize GCoP under a cost-sensitive utility objective and show that end-to-end performance is governed by guide-averaged executability: the probability that a strategy can be faithfully followed by the core. Our analysis reveals that existing instantiations of GCoP often fail to optimize executability under deployment constraints, leading to brittle strategies and inefficient computation. Guided by these insights, we propose ExecTune, a principled training recipe that combines teacher-guided acceptance sampling, supervised fine-tuning, and structure-aware reinforcement learning to directly optimize syntactic validity, execution success, and cost efficiency. Across mathematical reasoning and code-generation benchmarks, GCoP with ExecTune improves accuracy by up to **9.2%** over prior state-of-the-art baselines while reducing inference cost by up to **22.4%**. GCOP with ExecTune enables Claude Haiku-3.5 to surpass Sonnet-3.5 on math and code tasks and comes within **1.7%** absolute accuracy of Sonnet 4 at **38%** lower cost. Beyond efficiency, GCoP enables modular adaptation by updating guides without retraining the core.
Submission Number: 115
Loading