Keywords: Predictive Coding, Backpropagation, Deep Neural Networks, Loss Landscape, Saddle Points, Gradient Descent, Vanishing Gradients, Local Learning, Inference Learning
TL;DR: Predictive coding inference makes the loss landscape of feedforward neural networks more benign and robust to vanishing gradients.
Abstract: Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. To address this gap, we study the geometry of the PC weight landscape at the inference equilibrium of the network activities. For deep linear networks, we first show that the equilibrated PC energy is equal to a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Experiments on both linear and non-linear networks strongly validate our theory and further suggest that all the saddles of the equilibrated energy are strict. Overall, this work shows that PC inference makes the loss landscape of feedforward networks more benign and robust to vanishing gradients, while also highlighting the fundamental challenge of scaling PC to very deep models.
Primary Area: Neuroscience and cognitive science (neural coding, brain-computer interfaces)
Submission Number: 10413
Loading