Keywords: language modeling
Abstract: Auto-regressive (AR) language models factorize sequence probabilities as $P_\theta(\mathbf{w}) = \prod_t P_\theta(w_t | \mathbf{w}_{<t})$. While empirically powerful, their internal mechanisms remain partially understood. This work introduces an analytical framework using Markov Categories (MCs), specifically the category $\mathrm{Stoch}$ of standard Borel spaces and Markov kernels. We model the AR generation step as a composite kernel. Leveraging the enrichment of $\mathrm{Stoch}$ with statistical divergences $D$ and associated categorical information measures (entropy $\mathcal{H}_D$, mutual information $I_D$), we define principled metrics: Representation Divergence, State-Prediction Information, Temporal Coherence, LM Head Stochasticity, and Information Flow Bounds via the Data Processing Inequality. Beyond providing metrics, this framework analyzes the negative log-likelihood (NLL) objective, arguing that NLL minimization equates to optimal compression and learning the data's intrinsic stochasticity. We employ information geometry, analyzing the pullback Fisher-Rao metric $g^*$ on the representation space $\mathcal{H}$, to understand learned sensitivities. Furthermore, we formalize the concept that NLL acts as implicit structure learning, demonstrating how minimizing NLL forces representations of predictively dissimilar contexts apart.
Submission Number: 53
Loading