Entropy-Lens: Uncovering Decision Strategies in LLMs

Christopher Irwin; Francesco Caso; Riccardo Ali; Pietro Lio

Entropy-Lens: Uncovering Decision Strategies in LLMs

Christopher Irwin, Francesco Caso, Riccardo Ali, Pietro Lio

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Interpretability, Large language models, Information theory

TL;DR: A single entropy value per layer uncovers how frozen transformers compute across models, tasks, and domains.

Abstract: Transformer blocks iteratively refine next-token distributions, yet most interpretability tools analyze hidden states rather than token-space dynamics. We introduce Entropy-Lens, a model-agnostic method that tracks the entropy of logit-lens predictions across layers, yielding an entropy profile: a per-layer, permutation-invariant scalar summary of token prediction dynamic. Entropy differences between consecutive layers act as a proxy for two strategies: expansion (more candidates) and pruning (fewer candidates). Across model families and scales, entropy profiles show stable family-specific token prediction dynamics and exhibit depth-rescaling invariance. Finally, selectively skipping layers associated with maximal expansion or pruning shows that the two strategies have unequal functional importance for downstream multiple-choice accuracy, with expansion typically being more critical.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Style Files: I have used the style files.

Submission Number: 114

Loading