Hyperparameter-Free Auto-Scaled Gradient Normalization via Global Standard Deviation Dynamics

Vincent-Daniel Yun

Hyperparameter-Free Auto-Scaled Gradient Normalization via Global Standard Deviation Dynamics

Vincent-Daniel Yun

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optimization, Neural Networks, Gradient Descent, Gradient Dynamics, Scaling, SGD

TL;DR: This study analyzes gradient dynamics in CNNs and introduces a hyperparameter-free gradient normalization that stabilizes training and improves accuracy on CIFAR-100

Abstract: Gradient dynamics play a central role in determining the stability and generalization of deep neural networks. In this work, we provide an empirical analysis of how variance and standard deviation of gradients evolve during training, showing consistent changes across layers and at the global scale in convolutional networks. Motivated by these observations, we propose a hyperparameter-free gradient normalization method that aligns gradient scaling with their natural evolution. This approach prevents unintended amplification, stabilizes optimization, and preserves convergence guarantees. Experiments on the challenging CIFAR-100 benchmark with ResNet-20, ResNet-56, and VGG-16-BN demonstrate that our method maintains or improves test accuracy even under strong generalization. Beyond practical performance, our study highlights the importance of directly tracking gradient dynamics, aiming to bridge the gap between theoretical expectations and empirical behaviors, and to provide insights for future optimization research.

Submission Number: 155

Loading