Keywords: Computational Neuroscience, Hierarchical Variational Autoencoders, Biological Vision, Deep Generative Model, Top-down effects
TL;DR: Top-down hierarchical Variational Autoencoders learn representations similar to primate V1/V2 on natural images and model phenomena like noise correlations, illusory contour response, and image inpainting.
Abstract: Interpreting computations in the visual cortex as learning and inference in a generative model of the environment has received wide support both in neuroscience and cognitive science. However, hierarchical computations, a hallmark of visual cortical processing, have remained impervious for generative models because of the lack of adequate tools to address it. Here, we capitalize on advances in Variational Autoencoders (VAEs) to investigate the early visual cortex with sparse-coding two-layer hierarchical VAEs trained on natural images. We show that representations similar to those found in the primary and secondary visual cortices naturally emerge under mild inductive biases. That is, the high-level latent space represents texture-like patterns reminiscent of the secondary visual cortex. We show that a neuroscience-inspired choice of the recognition model is important for learning noise correlations, performing image inpainting, and detecting illusory edges. We argue that top-down interactions, a key feature of biological vision, born out naturally from hierarchical inference. We also demonstrate that model predictions are in line with existing V1 measurements in macaques with regard to noise correlations and illusory contour stimuli.
Supplementary Material: zip