NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks

ICLR 2026 Conference Submission16222 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Eigenspectral analysis, feed-forward networks, Nonlinearity, latent space geometry, Large Language Models (LLMs)
TL;DR: We introduce NerVE, a lightweight eigenspectral framework revealing how FFN nonlinearities reinject and reshape variance across eigenmodes.
Abstract: We introduce NerVE, a unified eigenspectral framework for understanding how feed-forward networks (FFNs) in large language models (LLMs) organize and regulate information flow in high-dimensional latent space. Despite FFNs dominating the parameter budget, their high-dimensional dynamics remain poorly understood. NerVE addresses this gap through lightweight, memory-efficient tracking of eigenspectrum dynamics via four complementary metrics: Spectral Entropy (dispersion), Participation Ratio (effective dimensionality), Eigenvalue Early Enrichment (top-heaviness), and Jensen-Shannon divergence (distributional shifts). Our {\em key insight} is that FFN nonlinearities reinject and reshape variance across eigenmodes, fundamentally governing latent dimension utilization. We validate NerVE across model scales and diverse architectural configurations that each uniquely shape FFN dynamics: normalization strategies (PreLN, PostLN, MixLN, Norm-Free) controlling variance flow; FFN weight geometries constraining latent space; positional encoding and activation functions modulating information propagation. Across these settings, NerVE consistently recovers stable spectral signatures that correlate with model's generalization ability and respond predictably to design choices, providing actionable insights for architectural optimization beyond trial-and-error.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 16222
Loading