Abstract: Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, we see these failures to generalize more frequently in more modern networks. We show that these failures are related to the fact that the architecture of modern CNNs ignores the classical sampling theorem so that generalization is not guaranteed. We also show that biases in the statistics of commonly used image datasets makes it unlikely that CNNs will learn to be invariant to these transformations. Taken together our results suggest that the performance of CNNs in object recognition falls far short of the generalization capabilities of humans.
Keywords: Convolutional neural networks, The sampling theorem, Sensitivity to small image transformations, Dataset bias, Shiftability
TL;DR: Modern deep CNNs are not invariant to translations, scalings and other realistic image transformations, and this lack of invariance is related to the subsampling operation and the biases contained in image datasets.
Code: [![github](/images/github_icon.svg) AzulEye/CNN-Failures](https://github.com/AzulEye/CNN-Failures) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=HJxYwiC5tm)
Data: [Perceptual Similarity](https://paperswithcode.com/dataset/perceptual-similarity)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/why-do-deep-convolutional-networks-generalize/code)
13 Replies
Loading