Large Language Models as a Computable Surrogate to Solomonoff Induction

Large Language Models as a Computable Surrogate to Solomonoff Induction

ICLR 2026 Conference Submission15102 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Solomonoff Induction, Solomonoff Induction

Abstract: The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behaviors, existing theoretical frameworks remain fragmented in explaining emergent phenomena through a unified mathematical lens. We establish the first formal connection between LLM architectures and Algorithmic Information Theory (AIT) by proving two fundamental results: (1) the training process computationally approaches Solomonoff prior through loss minimization interpreted as program length optimization, and (2) Under the assumption that $M(x_{1:t})\approx \overline{M}(x_{1:t})$, LLMs' next-token prediction implements a form of surrogate Solomonoff induction. We leverage AIT to provide a heuristic, unified theoretical explanation for in-context learning, few-shot learning, and scaling laws. Furthermore, our theoretical insights lead to a principled method for few-shot example selection that prioritizes samples where models exhibit lower predictive confidence. We demonstrate through experiments on diverse text classification benchmarks that this strategy yields significant performance improvements, particularly for smaller model architectures, when compared to selecting high-confidence examples. Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.

Primary Area: learning theory

Submission Number: 15102

Loading