Flatness-guided hyper-parameter optimization

Flatness-guided hyper-parameter optimization

TMLR Paper2945 Authors

01 Jul 2024 (modified: 06 Oct 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the Hessian norm and established sharpness metrics. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the upper bound on the sharpness of the loss. By using the structure of the underlying neural network, we derive semi-empirical estimates for the sharpness of the loss, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Added more experiments based on R6ZG2's feedback

Assigned Action Editor: ~Yingbin_Liang1

Submission Number: 2945

Loading