How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles
Keywords: LLMs, Long Context, Sampling, Decoding, Language Structure, Inference
TL;DR: We look to find the minimal required sub-context in a context/next-token pair required to guess the next token correctly.
Abstract: Despite the growing trend towards large-context transformer models, key questions remain about how much context is truly required for accurate language modeling. We explore this by treating large language models as statistical oracles and measuring the smallest prefix needed to replicate full-context next-token predictions. Using samples from diverse natural text sources, we evaluate minimal context length requirements across various decoding strategies using correctness and support set overlap metrics. Under greedy decoding, we find that over 80\% of tokens require less than 10\% of the most recent context to yield identical predictions. For general sampling strategies, we define Recall and Risk metrics to assess context dependence, and find that dynamic strategies offer higher support coverage at low percentiles—while also increasing Risk due to broader supports at shorter contexts.
Code: ipynb
Submission Number: 71
Loading