The Information Game: Active Inference as Bilevel Optimization and a Game-Theoretic Benchmark for LLM Inquiry

ICLR 2026 Conference Submission24937 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: active inference, bilevel optimization, question asking, query optimality, inference, LLMs
TL;DR: We frame question answering as bilevel optimization and use that to benchmark frontier LLMs on their efficiency at reducing uncertainty through question asking; we find these LLMs still lag an information-theoretic oracle
Abstract: Large language models (LLMs) increasingly operate in settings where they must gather information rather than simply recall facts. We model this task as a multi-street game of incomplete information casting each round of information gathering as a bilevel optimization: an inner variational Bayesian step that updates beliefs over a hidden target object, and an outer query-selection step that minimizes expected free energy, which is equivalent to maximizing expected information gain. This game-theoretic formulation motivates \emph{Optimal Question Asking} (OQA), a benchmark designed as a tractable "toy game" to measure an agent's inquiry strategy by measuring how quickly an agent reduces uncertainty about the target. By solving this game for its Game-theory optimal (GTO) policy, we create a perfect oracle against which we measure the planning gap—the expected number of suboptimal queries.On 25-object tasks, models like GPT-4o and Claude 3.5 Haiku exhibit a planning gap of 1-2 queries. On 100-object tasks, flagship models like GPT-o3 and Gemini 2.5 Pro, while closer to optimal, still show significant strategic leaks. Our synthetic datasets, which remove linguistic priors, reveal deeper deficits. OQA exposes inefficiencies invisible to answer-centric metrics, offering a controlled testbed for forging agents that play the information game not just exploitatively, but optimally.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24937
Loading