How Many Parameters for Multi-Hop? An Information-Theoretic Capacity Law for Knowledge Retrieval in Large Language Models
Keywords: multi-hop reasoning, information-theoretic capacity, scaling laws, factual memorization, large language models, knowledge retrieval, synthetic benchmark, parameter efficiency
TL;DR: We prove and empirically validate a closed-form bound linking model size to the ability of LLMs to answer k-hop factual queries, showing a sharp size-depth phase transition and providing a practical rule for sizing models to reasoning depth.
Abstract: How large must a language model be to answer questions that require chaining several facts together? We present the first information-theoretic answer. Treating an autoregressive transformer as a noisy associative-memory channel, we derive a closed-form lower bound that links model size, reasoning depth, and error tolerance. To evaluate the theory we create a synthetic benchmark whose surface statistics stay identical as hop length grows, ensuring that only compositional reasoning becomes harder. Tests on Gemma-2B, LLaMA-7B, and Mistral-7B-Instruct show a sharp drop in multi-hop accuracy at almost exactly the depth predicted by the bound, and unstructured pruning shifts this transition by the amount the theory forecasts. The result is both a tight theoretical limit on what current models can know through parameters alone and a practical rule of thumb for sizing models to the depth of reasoning required by downstream tasks—an early step toward scaling laws that target reasoning depth rather than token-level perplexity.
Archival Status: Archival (included in proceedings)
Submission Number: 63
Loading