Published: 01 Jan 2023, Last Modified: 22 Sept 2023ICML 2023Readers: Everyone
Abstract:When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step ...