Energy-Efficient Deep Learning via Update Sampling from a Generalized Gaussian Distribution: An Empirical Study

TMLR Paper5151 Authors

18 Jun 2025 (modified: 30 Jun 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The computation of loss gradients via backpropagation constitutes a significant portion of the energy consumption in the training of deep learning (DL) models. This paper introduces a simple yet effective method to reduce energy usage during training by leveraging the overparameterization of DL models. Under this assumption, the loss landscape is smooth, and we hypothesize that gradient elements follow a Generalized Gaussian Distribution (GGD). Based on this hypothesis, energy savings are achieved by skipping entire training epochs and estimating gradients by sampling from a GGD. Specifically, parameter updates during skipped epochs are performed by adding GGD-based samples of gradient components to the model parameters from the previous epoch. Furthermore, we present a theorem that provides an upper bound on the expected loss behavior, along with the corresponding convergence rate. We provide extensive empirical validation of our GGD hypothesis across various tasks—image classification, object detection, and image segmentation—using widely adopted DL models. Results show substantial reductions in energy consumption without compromising model performance. Additionally, we evaluate our method on Domain Adaptation (DA), Domain Generalization (DG), and Federated Learning (FL) tasks, observing similar energy savings. To further validate the adaptability of our sampling strategy, we also test it in large language model (LLM) pre-training, demonstrating its effectiveness across diverse settings.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Pierre_Ablin2
Submission Number: 5151
Loading