AdaS: Adaptive Scheduling of Stochastic GradientsDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Adaptive Stochastic Optimization, Deep Convolution Neural Network, Low-Rank Factorization
Abstract: The choice of learning rate has been explored in many stochastic optimization frameworks to adaptively tune the step-size of gradients in iterative training of deep neural networks. While adaptive optimizers (e.g. AdaM, AdaGrad, RMSProp, AdaBound) offer fast convergence, they exhibit poor generalization characteristics. To achieve better performance, the manual scheduling of learning rates (e.g. step-decaying, cyclical-learning, warmup) is often used but requires expert domain knowledge. It provides limited insight into the nature of the updating rules and recent studies show that different generalization characteristics are observed with different experimental setups. In this paper, rather than raw statistic measurements from gradients (which many adaptive optimizers use), we explore the useful information carried between gradient updates. We measure the energy norm of the low-rank factorization of convolution weights in a convolution neural network to define two probing metrics; knowledge gain and mapping condition. By means of these metrics, we provide empirical insight into the different generalization characteristics of adaptive optimizers. Further, we propose a new optimizer--AdaS--to adaptively regulate the learning rate by tracking the rate of change in knowledge gain. Experimentation in several setups reveals that AdaS exhibits faster convergence and superior generalization over existing adaptive learning methods.
One-sentence Summary: A new adaptive stochastic optimization called AdaS is proposed for training deep convolution neural network which exhibits superior converges compared to the existing adaptive methods and maintains the generalization ability of SGD at the same time.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2006.06587/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=dQzr1dNLkD
14 Replies

Loading