FoVAE: Reconstructive Foveation as a Self-Supervised Variational Inference Task for Visual Representation Learning

Published: 27 Oct 2023, Last Modified: 27 Oct 2023Gaze Meets ML 2023 PosterEveryoneRevisionsBibTeX
Keywords: foveation, reconstruction, variational autoencoder, predictive coding
TL;DR: A VAE model that performs zero-shot reconstruction of simple visual datasets driven by a self-supervised reconstructive foveation mechanism.
Abstract: We present the first steps toward a model of visual representation learning driven by a self-supervised reconstructive foveation mechanism. Tasked with looking at one visual patch at a time while reconstructing the current patch, predicting the next patch, and reconstructing the full image after a set number of timesteps, FoVAE learns to reconstruct images from the MNIST and Omniglot datasets, while inferring high-level priors about the whole image. In line with theories of Bayesian predictive coding in the brain and prior work on human foveation biases, the model combines bottom-up input processing with top-down learned priors to reconstruct its input, choosing foveation targets that balance local feature predictability with global information gain. FoVAE is able to transfer its priors and foveation policy across datasets to reconstruct samples from untrained datasets in a zero-shot transfer-learning setting. By showing that robust and domain-general policies of generative inference and action-based information gathering emerge from simple biologically-plausible inductive biases, this work paves the way for further exploration of the role of foveation in visual representation learning.
Submission Type: Extended Abstract
Supplementary Material: zip
Submission Number: 24
Loading