From Entropy Rate to Redundancy: Information Dynamics in Large Language Models

Published: 23 Sept 2025, Last Modified: 23 Dec 2025SPIGM @ NeurIPSEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Entropy Rate, Redundancy, Information Dynamics, Large Language Models
TL;DR: Information Dynamics in Large Language Models
Abstract: Large language models (LLMs) achieve impressive performance, yet the mechanisms by which information flows and adapts during fine-tuning remain underexplored. We introduce entropy rate as a dynamic, space--time information-theoretic metric that captures how uncertainty propagates across layers and evolves across epochs. Building on this foundation, we derive the redundancy score, a tractable approximation that quantifies the predictability of each layer’s representations from its neighbors. Layers with high redundancy contribute little novel information and are strong candidates for structured pruning. Empirical studies on RoBERTa-base (GLUE benchmark) show that redundancy-score pruning achieves substantial compression while preserving accuracy, outperforming knowledge-entropy pruning, LayerDrop, and SlimLLM. Beyond compression, redundancy profiles reveal consistent architectural patterns, with mid-layer peaks corresponding to dynamic representational activity. These findings position entropy rate and redundancy score as principled, interpretable tools for analyzing, optimizing, and compressing foundation models in natural language understanding and reasoning.
Submission Number: 32
Loading