Keywords: dynamic decoding, instruction-based Control, truly end-to-end
TL;DR: We introduces AutoDeco to dynamically generate sampling parameters which improves LLMs' performance with almost no added latency. Crucially, it enables the model can understand natural language commands and actively steer its own decoding parameters.
Abstract: The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end'' generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight heads that, at each step, dynamically predict context-specific temperature and top-p values alongside the next-token logits. This approach transforms decoding into a parametric, token-level process, allowing the model to self-regulate its sampling strategy within a single forward pass.
Through extensive experiments on eight benchmarks, we demonstrate that AutoDeco not only significantly outperforms default decoding strategies but also achieves performance comparable to an oracle-tuned baseline derived from "hacking the test set"—a practical upper bound for any static method. Crucially, we uncover an emergent capability for instruction-based control: the model learns to interpret natural language commands (e.g., "generate with low randomness") and adjusts its predicted temperature and top-p on a token-by-token basis, opening a new paradigm for steerable and interactive LLM decoding.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15634
Loading