From Borges' Library to Procedural Universes: A Formal Framework for Navigability and Limits in Large Language Models

Published: 08 Oct 2025, Last Modified: 19 Oct 2025Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models (LLMs), Procedural libraries, Entropy and navigability, Hallucination decomposition, Retrieval-Augmented Generation (RAG), Operator composition, Trustworthy AI, Theoretical foundations of AI systems
TL;DR: The paper models large language models as procedural libraries, introducing entropy-based metrics for navigability and hallucination risk, proving limits of operator composition, and validating insights with a small empirical study.
Abstract: Large Language Models (LLMs) can be understood as procedural libraries: instead of storing all texts, they generate strings on demand according to a learned distribution $P_\theta$ over $\Sigma^*$. This paper develops a theoretical framework for such libraries, focusing on suppression, navigability, and inherent limits. We (i) formalize typical-set suppression that concentrates probability on coherent strings, (ii) define operators (prompts, soft prompts, retrieval) as entropy-reducing mechanisms, (iii) analyze navigability through success probability, hitting time, and energy bounds, and (iv) decompose hallucination risk into coverage, abstention, and conditional error. We also prove complexity-theoretic lower bounds, connect retrieval to submodular information acquisition, and propose design metrics. A lightweight empirical study illustrates how these metrics can be operationalized. Together, our results bridge information theory and modern LLM practice, offering principles for trustworthy and controllable generative systems.
Supplementary Material: zip
Submission Number: 177
Loading