Keywords: Foveation, Active Perception, Data-Augmentation, Self-Supervised Learning
TL;DR: We frame spatially-adaptive computation (foveation) and saccades as biological proxies of data augmentation for self-supervised learning
Abstract: Self-supervised learning is a strong way to learn useful representations from the bulk of natural data. It's suggested to be responsible for building the visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast to those of other images. However, such transformations are generally non-biologically plausible, and often consist of contrived perceptual schemes such as random cropping and color jittering. In this paper, we attempt to reconfigure these augmentations to be more biologically or perceptually plausible while still conferring the same benefits for encouraging a good representation. Critically, we find that random cropping can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning. The feasibility of these transformations suggests a potential way that biological visual systems could implement self-supervision. Further, they break the widely accepted spatially-uniform processing assumption used in many computer vision algorithms, suggesting a role for spatially-adaptive computation in humans and machines alike.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2112.07173/code)