Keywords: Neuroscience, Hebbian Learning, Gradient Descent
TL;DR: Hebbian dynamics can emerge from regularized SGD and other learning algorithms.
Abstract: Stochastic gradient descent (SGD) is often viewed as biologically implausible, while local Hebbian rules dominate theories of synaptic plasticity in our brain. We prove and empirically demonstrate--on small MLPs and transformers that can be trained on a single GPU--that SGD with weight decay can naturally produce Hebbian-like dynamics near stationarity, whereas injected gradient noise can flip the alignment to be anti-Hebbian. The effect holds for nearly any learning rule, even some random ones, revealing Hebbian behavior as an emergent epiphenomenon of deeper optimization dynamics during training. These results narrow the gap between artificial and biological learning and caution against treating observed Hebbian signatures as evidence against global error-driven mechanisms in our brains.
Code: ipynb
Submission Number: 16
Loading