Sassha: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce Sassha, a novel second-order optimization method that improves generalization by reducing solution sharpness.
Abstract: Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.
Lay Summary: How can we accelerate learning? One promising approach is to use second-order optimization methods, which utilize second-order derivatives to speed up convergence during training. Ironically, however, these methods often struggle to generalize to unseen data, thus missing the whole point of accelerating “learning”. What's going wrong, and how can we fix it? Our in-depth study suggests sharp minima as the culprit—a growing idea in recent research that makes intuitive sense: when the curvature of the loss landscape near minima is too steep, even small input changes can hurt performance. Building on this insight, we propose SASSHA, a novel second-order method designed to stably flatten the loss curvature. SASSHA not only improves generalization but also enhances efficiency by reducing the need for frequent second-order derivative computations, a primary source of computational overhead in second-order methods. In conclusion, we present a practical path for realizing the potential of second-order optimization methods through recovering their generalization and efficiency. We expect this methodology to further expand the applicability of second-order methods across a wide range of learning domains.
Link To Code: https://github.com/LOG-postech/Sassha
Primary Area: Deep Learning->Algorithms
Keywords: deep learning, second-order optimization, sharpness minimization
Submission Number: 3658
Loading