Agentic Collaboration as an Information Bottleneck Problem

Agentic Collaboration as an Information Bottleneck Problem

ICLR 2026 Conference Submission19669 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: information bottleneck, rate-distortion theory, agentic collaboration, large language models, scaling laws

TL;DR: We frame agentic language model systems as a information bottleneck problem, deriving scaling laws and practical design principles for efficient collaboration between LMs.

Abstract: Agentic language model (LM) systems have rapidly become central to modern workflows, powering applications like "Deep Research" and "Claude Code." Beneath their apparent diversity lies a recurring pattern: smaller "compressor" LMs distill raw context into compact text that is then consumed by larger "predictor" LMs that interact with the user. Despite their popularity, the design of compressor-predictor systems remains largely ad-hoc. Little guidance exists on how compressor and predictor choices shape downstream performance. Attributing gains to compression versus prediction typically requires exhaustive pairwise sweeps. We argue that these agentic system design questions are, at root, information-theoretic. Viewing the compressor LM as a noisy channel, we introduce a simple estimator of the mutual information between the context and its compression to quantify compression quality in a task-independent way. Using a rate-distortion analysis, we show that mutual information strongly predicts downstream performance. With this toolkit, we perform a comprehensive empirical analysis across four datasets and three model families. Results reveal that larger compressors are both more accurate and more token-efficient, conveying more bits of mutual information per token. A 7B Qwen-2.5 compressor, for instance, is $1.6\times$ more accurate, $4.6\times$ more concise, and conveys $5.5\times$ more bits of mutual information per token. Across the datasets studied, scaling compressors is substantially more effective than scaling predictors, enabling larger on-device compressors to pair with smaller cloud predictors. When applied to a Deep Research system, these principles enable local compressors as small as 3B parameters to recover 99% of frontier-LM accuracy at 26% of API costs.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19669

Loading