Keywords: CNN, convolution, foveation, padding, boundary condition
TL;DR: Visual processing in CNNs is inherently foveated, just like in human vision; the Gaussian distribution of retinal photoreceptors is probably the aftermath, not the source of this effect.
Abstract: When convolutional layers apply no padding, central pixels have more ways to contribute to the convolution than peripheral pixels. Such discrepancy grows exponentially with the number of layers, leading to implicit foveation of the input pixels. We show that this discrepancy can persist even when padding is applied. In particular, with the commonly-used zero-padding, foveation effects are significantly reduced but not eliminated. We explore how different aspects of convolution arithmetic impact the extent and magnitude of these effects, and elaborate on which alternative padding techniques can mitigate it. Finally, we compare our findings with foveation in human vision, suggesting that both effects possibly have similar nature and implications.