Adafactor: Adaptive Learning Rates with Sublinear Memory CostDownload PDFOpen Website

2018 (modified: 11 Nov 2022)ICML 2018Readers: Everyone
Abstract: In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past ...
0 Replies

Loading