Training Deep Networks with Stochastic Gradient Normalized by Layerwise Adaptive Second Moments
Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen
23 Sept 2020 (modified: 05 May 2023)ICLR 2020Readers: Everyone