Foveated convolutions: improving spatial transformer networks by modelling the retinaDownload PDF

04 Oct 2020 (modified: 05 Oct 2020)OpenReview Archive Direct UploadReaders: Everyone
Abstract: Spatial Transformer Networks (STNs) have the potential to dramatically improve performance of convolutional neural networks in a range of tasks. By ‘focusing’ on the salient parts of the input using a differentiable affine transform, a network augmented with an STN should have increased performance, efficiency and interpretability. However, in practice, STNs rarely exhibit these desiderata, instead converging to a seemingly meaningless transformation of the input. We demonstrate and characterise this localisation problem as deriving from the spatial invariance of feature detection layers acting on extracted glimpses. Drawing on the neuroanatomy of the human eye we then motivate a solution: foveated convolutions. These parallel convolutions with a range of strides and dilations introduce specific translational variance into the model. In so doing, the foveated convolution presents an inductive bias, encouraging the subject of interest to be centred in the output of the attention mechanism, giving significantly improved performance. The code for all experiments is available at https://github.com/ethanwharris/foveated-convolutions
0 Replies

Loading