LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

ICLR 2026 Conference Submission14595 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Alignment Tuning, LLM, Probability Concentration, Chain-of-Thoughts

TL;DR: "LLM probability concentration" unifies disparate alignment phenomena by showing that alignment works not by changing a model fundamentally, but by steering it down latent low-entropy paths.

Abstract: Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this stability in the generation? We investigate this phenomenon through the lens of probability concentration in the model’s output distribution. To quantify this concentration, we introduce the Branching Factor (BF)--a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model's output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. We show that aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models) exhibit even lower BF and reduced variance across samples, as CoT extends reasoning into later, more deterministic positions, leading to more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model’s behavior, but instead steers it toward specific stylistic tokens (e.g., ``Sure'') that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show that prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling output behavior in LLMs - clarifying why alignment reduces variability, how CoT promotes stable generations, and how base models can be steered toward or away from diversity.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14595

Loading