SEMIE: Semantic Entropy-Informed Decoding

Benjamin Patrick Evans; Sumitra Ganesh; Leo Ardon

SEMIE: Semantic Entropy-Informed Decoding

Benjamin Patrick Evans, Sumitra Ganesh, Leo Ardon

Published: 02 Mar 2026, Last Modified: 06 Apr 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 10 pages)

Keywords: adaptive decoding, beam search, semantic decoding, monte carlo, sampling

Abstract: Inference time techniques, such as beam search and best-of-$n$, typically allocate a fixed amount of computation throughout LLM generations. However, recent entropy-informed decoding methods demonstrate that the shape of the output distribution provides a principled signal for more sample-efficient (adaptive) generations. Even so, the existing entropy-based decoding methods operate at the *token-level*, wasting computation across lexically distinct but semantically equivalent continuations. In this work, we introduce *Semantic Entropy-Informed Decoding (SEMIE)*, a Monte Carlo-based decoding strategy that operates over *semantic continuations* rather than tokens, adaptively branching in regions of high semantic information gain (measured based on semantic entropy) to improve sample efficiency. This semantic entropy provides a measure of uncertainty at higher-level abstractions (e.g., at the level of topics, strategies, or reasoning paths), rather than the underlying tokens. We prove theoretically that adaptive allocation based on semantic entropy yields strictly lower semantic regret than both fixed-width and token-level entropy-informed decoding under equal compute. Empirically, SEMIE attains superior accuracy–compute trade-offs across a range of LLMs and benchmarks, consistently outperforming best-of-n, beam search, and token-level adaptive branching, under generation equated comparisons.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 19

Loading