Scaled Gradient Mean Subtraction: A Lightweight Method for Amplifying Underutilized Gradient Directions

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Gradient Modification, Training Efficiency, Deep Learning Optimization
TL;DR: We propose Scaled Gradient Mean Subtraction (SGMS), a lightweight optimization method that modifies gradients to amplify underutilized directions in the weight parameter space, improving training efficiency in DNNs without extra memory cost.
Abstract: We propose Scaled Gradient Mean Subtraction (SGMS), a lightweight method that improves neural network training by amplifying underutilized gradient directions. In mini-batch training, gradients from individual samples are expected to point in diverse directions, but in practice they are often highly correlated, spanning only a few dominant directions. Consequently, weight updates are confined to a low-rank subspace, leaving many directions underutilized. SGMS addresses this imbalance by subtracting a scaled mean from the gradient. With common ReLU-like activations (e.g., ReLU, GeLU, SiLU), this simple operation weakens the most dominant direction, allowing complementary directions to play a greater role in optimization. Unlike approaches that rely on costly covariance statistics or matrix decompositions, SGMS achieves a similar rebalancing of gradient directions with a single mean-subtraction step, adding negligible overhead and requiring no architectural changes. Formally, SGMS generalizes Gradient Centralization (GC) as a special case, but—by partially rather than fully suppressing the mean direction—it retains valuable gradient components that GC eliminates. Experiments on CIFAR and ImageNet across multiple architectures show consistent accuracy improvements, validating SGMS as a practical and flexible framework for rebalancing gradient directions.
Primary Area: optimization
Submission Number: 8585
Loading