On the numerical reliability of nonsmooth autodiff: a MaxPool case study

Published: 19 Jun 2024, Last Modified: 19 Jun 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: This paper considers the reliability of automatic differentiation for neural networks involving the nonsmooth MaxPool operation across various precision levels (16, 32, 64 bits), architectures (LeNet, VGG, ResNet), and datasets (MNIST, CIFAR10, SVHN, ImageNet). Although AD can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations. On the other hand, in practice, AD operates with floating-point numbers, and there is, therefore, a need to explore subsets on which AD can be {\em numerically} incorrect. Recently, \cite{bertoin2021numerical} empirically studied how the choice of $\ReLU'(0)$ changes the output of AD and define a numerical bifurcation zone where using $\ReLU('0) = 0$ differs from using $\ReLU'(0) = 1$. To extend this for a broader class of nonsmooth operations, we propose a new numerical bifurcation zone (where AD is incorrect over real numbers) and define a compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using SGD for training, we found that nonsmooth MaxPool Jacobians with lower norms maintain stable and efficient test accuracy, while higher norms can result in instability and decreased performance. We can use batch normalization, Adam-like optimizers, or increase precision to reduce MaxPool Jacobians influence.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Camera-ready version.
Code: https://github.com/ryanboustany/MaxPool-numerical
Assigned Action Editor: ~Yunwen_Lei1
Submission Number: 2280