Achieving small-batch accuracy with large-batch scalability via Hessian-aware learning rate adjustment

Published: 01 Jan 2023, Last Modified: 31 Jan 2025Neural Networks 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Hessian information allows to properly adjust noise scale in large-batch training.•Too early learning rate decay harms underlying margin distribution.•The minimum learning rate after the decay strongly affects the model sharpness.•The length of noise scale transition affects the final generalization performance.
Loading