A simple connection from loss flatness to compressed neural representations

Shirui Chen; Stefano Recanatesi; Eric Todd SheaBrown

A simple connection from loss flatness to compressed neural representations

Shirui Chen, Stefano Recanatesi, Eric Todd SheaBrown

Published: 14 Feb 2026, Last Modified: 14 Feb 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Despite extensive study, the fundamental significance of sharpness---the trace of the loss Hessian at local minima---remains unclear. While often associated with generalization, recent work reveals inconsistencies in this relationship. We explore an alternative perspective by investigating how sharpness relates to the geometric structure of neural representations in feature space. Specifically, we build from earlier work by Ma and Ying to broadly study compression of representations, defined as the degree to which neural activations concentrate when inputs are locally perturbed. We introduce three quantitative measures: the Local Volumetric Ratio (LVR), which captures volume contraction through the network; the Maximum Local Sensitivity (MLS), which measures maximum output change normalized by the magnitude of input perturbations; and Local Dimensionality, which captures uniformity of compression across directions. We derive upper bounds showing that LVR and MLS are mathematically constrained by sharpness: flatter minima necessarily limit these compression metrics. These bounds extend to reparametrization-invariant sharpness (measures unchanged under layer rescaling), addressing a key limitation of standard sharpness. We introduce network-wide variants (NMLS, NVR) that account for all layer weights, providing tighter and more stable bounds than prior single-layer analyses. Empirically, we validate these predictions across feedforward, convolutional, and transformer architectures, demonstrating consistent positive correlation between sharpness and compression metrics. Our results suggest that sharpness fundamentally quantifies representation compression rather than generalization directly, offering a resolution to contradictory findings on the sharpness-generalization relationship and establishing a principled mathematical link between parameter-space geometry and feature-space structure. Code is available at \url{https://github.com/chinsengi/sharpness-compression}.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: ### **Foundational Rewrites (Addressed Reviewer KMtE's Core Concerns)** - Complete abstract rewrite with upfront definitions and intuitive contributions - Introduction enhancement with "Key terminology" section - Contributions list rewrite with implications, scope, and limitations ### **Section 3 Reorganization (Addressed Reviewers 8k2n & KMtE)** - Added "Key assumptions" at start of Section 3, highlighted introduction roadmap - Added motivation paragraphs before each definition/proposition - Clarified Equation 7 derivation and connections - Added "Why LVR and MLS quantify compression" explanation ### **Technical Clarifications (Addressed Reviewers 8k2n & sf7P)** - Revised loss function discussion, removed "loss-agnostic" claim, fixed CE loss explanation - Added "On the role of weight norms in the bounds" addressing sharpness-compression nuances - Clarified robustness of representations, fixed needle-in-haystack example, defined OOD - Rewrote neural collapse discussion with honest assessment of connection ### **Minor Technical Fixes (Addressed Reviewer 8k2n)** - Fixed "with high probability" issue, clarified Lemma 3.2 proof, fixed symbol error

Code: https://github.com/chinsengi/sharpness-compression

Assigned Action Editor: ~Qing_Qu2

Submission Number: 5672

Loading