Abstract: In recurrent neural networks (RNNs) used to model biological neural networks, noise is typically introduced during training to emulate biological variability and regularize learning. The expectation is that removing the noise at test time should preserve or improve performance. Contrary to this intuition, we find that continuous-time recurrent neural networks (CTRNNs) often perform best at a nonzero noise level, often approximately the same level used during training. This noise preference typically arises when noise is injected inside the neural activation function; networks trained with noise injected outside the activation function perform best with zero noise. The phenomenon arises robustly in diverse tasks for large enough training noise including function approximation, maze navigation, 2D path integration, and a multi-task suite from cognitive neuroscience; we also show the phenomenon arising in feedforward neural networks, not just in RNNs. Through analyses of simple function-approximation and single-neuron regulator tasks, we show that the phenomenon stems from noise-induced shifts of fixed points (stationary distributions) in the underlying stochastic dynamics of the RNNs, thereby providing some mechanistic interpretability of the phenomenon. These fixed point shifts are noise-level dependent and bias the network outputs when the noise is removed, degrading performance. Analytical and numerical results show that the bias arises when neural states operate near activation-function nonlinearities, where noise is asymmetrically attenuated, and that performance optimization incentivizes operation near these nonlinearities; such performance incentives exist for networks with noise inside the activation function, but not for networks with noise outside the activation function, explaining why only noise-in networks show preference. Thus, networks can overfit to the stochastic training environment itself rather than just to the input–output data. The phenomenon is distinct from stochastic resonance, wherein nonzero noise enhances signal processing. Our findings reveal that training noise can become an integral part of the computation learned by recurrent networks, with implications for understanding neural population dynamics and for the design of robust artificial RNNs.
Certifications: J2C Certification
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: No changes except making the manuscript camera ready and linking to github repository of code.
Supplementary Material: zip
Assigned Action Editor: ~Christian_Keup1
Submission Number: 6921
Loading