Keywords: gradient descent, convergence, loss bounds, optimization, training dynamics, sustainability, efficiency, feasibility, computational cost, irreducible loss, non-convex optimization, lower bounds
TL;DR: We derive theoretical lower bounds for loss functions, revealing convergence limits without relying on standard convexity assumptions.
Abstract: Despite their central role, convergence analyses of the dynamics of loss functions
during training require strong assumptions (e.g convexity and smoothness) which
are non-trivial to prove. In this work, we introduce a framework for deriving
necessary convergence conditions that hold without restrictive assumptions on
the dataset or the model architecture. By linking microscopic properties such as
individual sample losses and their gradient to macroscopic training dynamics, we
derive tight lower bounds for loss functions, applicable to both full-batch and mini-
batch gradient systems. These bounds reveal the presence of irreducible floors
that optimizers cannot surpass and beyond theoretical guarantees, this framework offers a practical tool for anticipating convergence speed, and estimating
minimum training time and energy requirements. Thus, this framework can be
used to ensure the sustainability and feasibility of large-scale training regimes.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 19415
Loading