Large Language Models as Probabilistic Search Agents – A Token Perspective

ACL ARR 2025 May Submission8099 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Autoregressive large language model (LLM) decoding can be cast as a guided stochastic search over a combinatorial token space. We formalise this perspective and prove three information‑theoretic results. (i) Greedy decoding is equivalent to a cost‑minimising breadth‑first search whose path cost is cumulative negative log‑probability. (ii) The attainable cross‑entropy of any model is bounded below by the vocabulary size and the mutual information between context and next token, revealing a fundamental perplexity floor. (iii) Hallucination becomes inevitable once the search path's Shannon entropy exceeds this floor, causing low‑probability continuations to dominate. Two analytic case‑studies—a 3‑token arithmetic toy and a 5‑token chain‑of‑thought prompt—numerically verify the tightness of the bounds and illustrate how prompt engineering reshapes the explored sub‑space. Our proofs appear in full, with derivations deferred to an appendix, and the resulting framework yields actionable guidelines for tokenizer design, prompting strategy, and retrieval augmentation while explaining several empirical phenomena without running large‑scale experiments. We complement the proofs with empirical studies on GPT-2 and Llama-3.1-8B-Instruct, showing that the predicted entropy bounds hold in practice and that the path-entropy diagnostic is practical for modern models on WikiText-103.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Large Language model (LLMs), Probabilistic Search
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 8099
Loading