Keywords: decoding, search, branching, beam search, adaptive
Abstract: Large language models (LLMs) achieve remarkable generative performance, yet their output quality is dependent on the decoding strategy. While sampling-based methods (e.g., top-$k$, nucleus) and search-based methods (e.g., beam search) can improve upon greedy decoding, both approaches suffer from limitations: sampling commits to a single path, while search often expends excessive computation regardless of task complexity. We introduce Entropy-informed DEcodiNg (EDEN), a plug-and-play, model-agnostic decoding framework that adaptively allocates computation based on the model’s own uncertainty. At each generation step, EDEN estimates the entropy of the output token distribution and adjusts the branching factor monotonically with the entropy, expanding more candidates in high-entropy regions and following a greedier path in low-entropy regions. This dynamic allocation improves search efficiency without requiring additional training, external rewards, or architecture changes. For closed-source models, we provide theoretical bounds on the number of samples needed to estimate entropy within a target error, ensuring robust branching under limited API access. Experiments across complex tasks, including mathematical reasoning, code generation, and scientific questions, demonstrate that EDEN consistently improves output quality over both fixed-parameter sampling and fixed-width search, achieving better trade-offs between accuracy and computational cost. By treating next token selection as a noisy maximisation problem, we prove that branching factors monotone in entropy are guaranteed to find better (i.e. more probable) continuations than any fixed branching factor within the same total computation budget.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9388
Loading