$\mu$PC: Scaling Predictive Coding to 100+ Layer Networks

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: predictive coding, mup, optimisation theory, hyperparameter transfer, inference learning, local learning
TL;DR: We introduce $\mu$PC, a reparameterisation of predictive coding networks that enables stable training of 100+ layer ResNets on simple tasks with hyperparameter transfer.
Abstract: The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs) has recently been posed as a challenge for the community (Pinchetti et al., 2024). Here, we show that 100+ layer PCNs can be trained reliably using a Depth-$\mu$P parameterisation (Yang et al., 2023; Bordelon et al., 2023) which we call "$\mu$PC". By analysing the scaling behaviour of PCNs, we reveal several pathologies that make standard PCNs difficult to train at large depths. We then show that, despite addressing only some of these instabilities, $\mu$PC allows stable training of very deep (up to 128-layer) residual networks on simple classification tasks with competitive performance and little tuning compared to current benchmarks. Moreover, $\mu$PC enables zero-shot transfer of both weight and activity learning rates across widths and depths. Our results serve as a first step towards scaling PC to more complex architectures and have implications for other local algorithms. Code for $\mu$PC is made available as part of a JAX library for PCNs.
Primary Area: Neuroscience and cognitive science (e.g., neural coding, brain-computer interfaces)
Submission Number: 17454
Loading