Published: 01 Jan 2020, Last Modified: 15 May 2023ICML 2020Readers: Everyone
Abstract:It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In deep neural network training, this mechanism serves to screen out minima. However, the precise effec...