Irreducible Loss Floors in Gradient Descent Convergence and Energy Footprint

11 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: gradient descent, convergence, loss bounds, optimization, training dynamics, sustainability, efficiency, feasibility, computational cost, irreducible loss, non-convex optimization, lower bounds
TL;DR: We derive theoretical lower bounds for loss functions, revealing convergence limits without relying on standard convexity assumptions.
Abstract: Despite their central role, convergence analyses of the dynamics of loss functions during training require strong assumptions (e.g convexity and smoothness) which are non-trivial to prove. In this work, we introduce a framework for deriving necessary convergence conditions that hold without restrictive assumptions on the dataset or the model architecture. By linking microscopic properties such as individual sample losses and their gradient to macroscopic training dynamics, we derive tight lower bounds for loss functions, applicable to both full-batch and mini- batch gradient systems. These bounds reveal the presence of irreducible floors that optimizers cannot surpass and beyond theoretical guarantees, this framework offers a practical tool for anticipating convergence speed, and estimating minimum training time and energy requirements. Thus, this framework can be used to ensure the sustainability and feasibility of large-scale training regimes.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 19415
Loading