Extrapolation for Large-batch Training in Deep LearningDownload PDFOpen Website

2020 (modified: 22 Sept 2023)ICML 2020Readers: Everyone
Abstract: Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the trai...
0 Replies

Loading