What You See Is What You Transform: Foveated Spatial Transformers as a bio-inspired attention mechanism

Ghassan Dabane, Laurent U. Perrinet, Emmanuel Daucé

2022 (modified: 17 Apr 2023)IJCNN 2022Readers: Everyone

Abstract: Decoding the semantic content of images is nowadays dominated by the use of deep convolutional neural networks (DCNNs), However, their generalization capability is still undermined by the small translation invariance of their max-pooling layers. Taking inspiration from biological vision, we develop here a new methodology for translation-invariant processing with DCNNs. We build upon a recent model that implements two key biological mechanisms: foveated vision and the separation of the visual processing into a “what” and a “where” pathways. Alongside such foveal vision, we demonstrate the capability of a foveated spatial transformer to learn both pathways in an end-to-end fashion, without any spatial labelling whatsoever. Our results pave the way towards a new class of spatial visual transformers, implementing the principles of active (saccadic) vision over large visual displays.

0 Replies