Controlling Neural Network Smoothness for Neural Algorithmic Reasoning
Abstract: The modelling framework of neural algorithmic reasoning (Veličković & Blundell, 2021) postulates that a continuous neural network may learn to emulate the discrete reasoning steps of a symbolic algorithm. We investigate the underlying hypothesis in the most simple conceivable scenario – the addition of real numbers. Our results show that two layer neural networks fail to learn the structure of the task, despite containing multiple solutions of the true function within their hypothesis class. Growing the network’s width leads to highly complex error regions in the input space. Moreover, we find that the network fails to generalise with increasing severity i) in the training domain, ii) outside of the training domain but within its convex hull, and iii) outside the training domain’s convex hull. This behaviour can be emulated with Gaussian process regressors that use radial basis function kernels of decreasing length scale. Classical results establish an equivalence between Gaussian processes and infinitely wide neural networks. We demonstrate a tight linkage between the scaling of a network weights’ standard deviation and its effective length scale on a sinusoidal regression problem, suggesting simple modifications to control the length scale of the function learned by a neural network and, thus, its smoothness. This has important applications for the different generalisation scenarios suggested above, but it also suggests a partial remedy to the brittleness of neural network predictions as exposed by adversarial examples. We demonstrate the gains in adversarial robustness that our modification achieves on a standard classification problem of handwritten digit recognition. In conclusion, this work shows inherent problems of neural networks even for the simplest algorithmic tasks which, however, may be partially remedied through links to Gaussian processes.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Highlighted design choices as directly following the constructions in Neal (1996). Added additional experiments on CIFAR10 with 2-layer MLP (same architecture as used for MNIST). Added stronger adversarial attacks (AutoAttack), obtaining the same qualitative results and ordering of models. Fixed typo in eq. 6. Added reference to Pointer Value Retrieval paper. Added reference to Jordan et al. (2019). Fixed enumeration in Background section. Added discussion of Xu et al. (2021) and Rahaman et al. (2019) references. Added paragraph about activation normalisation to the discussion. Added interpolation experiment and tentative explanations to Appendix. Added comparison between TanH and ReLU+weight_decay to Appendix. Added experimental details and results for deeper ResNet9 model to Appendix. Added references to new experiments in Appendix to main text. Added more experiments and discussion about the optimal solution and interpolation experiments to Appendix.
Assigned Action Editor: ~Behnam_Neyshabur1
Submission Number: 638