Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian AnalysisDownload PDF

29 Aug 2020OpenReview Archive Direct UploadReaders: Everyone
Abstract: The notion of flat minima has gained attention as a key metric of the generalization ability of deep learning models. However, current definitions of flatness are known to be sensitive to parameter rescaling. While some previous studies have proposed to rescale flatness metrics using parameter scales to avoid the scale dependence, the normalized metrics lose the direct theoretical connections between flat minima and generalization. We first provide generalization error bounds using existing normalized flatness measures for smooth and stochastic networks using second-order approximation. Using the analysis, we then propose a novel normalized flatness metric. The proposed metric enjoys both direct theoretical connections and better empirical correlation to generalization error.
0 Replies

Loading