AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Jing Wang; Yunfei Teng; Anna Ewa Choromanska

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Jing Wang, Yunfei Teng, Anna Ewa Choromanska

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: optimization in deep learning, automatic learning rate

TL;DR: We develop an algorithm, AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems.

Abstract: Modern deep learning (DL) architectures are trained using variants of the SGD algorithm and typically rely on the user to manually drop the learning rate when the training curve saturates. In this paper, we develop an algorithm, that we call AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems. Specifically, it is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation, the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity “resets” and follows the pattern described above, increasing again until saturation. AutoDrop is built on this idea and drops the learning rate whenever the angular velocity saturates. The method is simple to implement, computationally cheap, and by design avoids the short-horizon bias problem. We show that AutoDrop achieves favorable performance compared to many different baseline manual and automatic learning rate schedulers, and matches the SOTA performance on all our experiments. On the theoretical front, we claim two contributions: we formulate the learning rate behavior based on the angular velocity and provide general convergence theory for the learning rate schedulers that decrease the learning rate step-wise, rather than continuously as is commonly analyzed.

Supplementary Material: zip

List Of Authors: Wang, Jing and Teng, Yunfei and Choromanska, Anna

Latex Source Code: zip

Signed License Agreement: pdf

Submission Number: 179

Loading