Abstract: We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the Hessian norm and established sharpness metrics. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the upper bound on the sharpness of the loss. By using the structure of the underlying neural network, we derive semi-empirical estimates for the sharpness of the loss, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Added more experiments based on R6ZG2's feedback
Assigned Action Editor: ~Yingbin_Liang1
Submission Number: 2945
Loading