Keywords: gradient noise, stochastic gradient descent, convolutional neural networks, layer-wise analysis, directional consistency, covariance structure
TL;DR: Small CNNs exhibit reproducible, layer-dependent gradient noise patterns across seeds, datasets, and optimizers, revealing structured components in SGD beyond randomness.
Abstract: Despite the remarkable success of deep learning, many aspects of training dynamics remain poorly understood. In particular, it is unclear whether the stochastic gradient updates produced by different random initializations exhibit any reproducible structure. In this work, we conduct a systematic empirical study of small convolutional neural networks (CNNs) trained on standard vision datasets to explore whether patterns in gradient noise are consistent across independent training runs. We track per-layer gradient norms, directions, and correlations over multiple random seeds, and observe stable, layer-dependent trends in gradient behavior across runs. In particular, early layers consistently exhibit higher directional alignment of gradient updates than deeper layers, while later layers display increased variability. These patterns persist across architectures, datasets, and optimization settings, suggesting that gradient noise may contain structured components beyond purely random fluctuations. Rather than aiming to establish definitive laws, this study provides an exploratory experimental framework for probing stochastic gradient dynamics and highlights empirical regularities that may inform future theoretical and experimental investigations of deep learning
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Style Files: I have used the style files.
Submission Number: 9
Loading