Spectral Scaling Laws in Language Models: \\ {\em How Effectively Do Feed-Forward Networks Use Their Latent Space?}

ACL ARR 2025 May Submission7997 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: As Large Language Models (LLMs) continue to scale, understanding how effectively their internal capacity is utilized becomes increasingly important, especially for inference-time efficiency. While existing scaling laws relate model size to loss and compute, they offer little insight into the representational dynamics of individual components. In this work, we focus on the Feed-Forward Network (FFN), a dominant sub-block in decoder-only transformers, and recast FFN width selection as a {\em spectral utilization} problem. We introduce a lightweight, differentiable diagnostic suite comprising: \textbf{Hard Rank} (Participation Ratio), Soft Rank (spectral entropy), Spectral Concentration, and the composite Spectral Utilization Index (SUI), designed to quantify how many latent directions are meaningfully activated. Our spectral audit across GPT-2, LLaMA, and nGPT models reveals that spectral utilization grows with model size but not monotonically with width, often peaking at intermediate dimensions (e.g., $D=2048$). We identify clear instances of \emph{spectral collapse}, where wider FFNs concentrate variance into a narrow subspace, leaving much of the latent space unused.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: spectral utilization
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 7997
Loading