Keywords: optimization algorithms, adaptive learning rate, gradient-based optimization, curvature estimation, neural network training, convergence acceleration
Abstract: Improving the learning efficiency of deep learning models remains a significant research focus. In this paper, we propose EAGLE (Early Approximated Gradient-based Learning-rate Estimator), a novel optimization method that accelerates parameter optimization. Firstly, to achieve faster loss convergence, EAGLE possesses the unique parameter update rule that leverages the local curvature of the loss landscape, derived from gradient variations between consecutive training steps. Secondly, to enhance training stability, it introduces a branching mechanism that adaptively switches to the existing Adam update rule under specific conditions where the EAGLE update rule might become unstable (e.g., extremely small gradient differences or locally upward convex shapes). In experiments on the GLUE SST-2 text classification task using a pre-trained GPT-2 model, EAGLE reached respectively the SGD with momentum’s final loss value 6.83× faster and the Adam’s final loss value 6.77× faster. Similarly, on the CIFAR-10 image classification task using a pre-trained ViT-B/16 model, EAGLE reached respectively the SGD with momentum’s final loss value 3.41× faster and the Adam’s final loss value 6.60× faster. To ensure reproducibility and promote further improvements, our code is publicly available on GitHub: https://github.com/keiotakmin/EAGLE
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 16652
Loading