Learning generative models with visual attention

Charlie Tang, Nitish Srivastava, Ruslan Salakhutdinov

Dec 24, 2013 (modified: Dec 24, 2013) ICLR 2014 conference submission readers: everyone
  • Decision: submitted, no decision
  • Abstract: Attention has long been proposed by psychologists as important for effectively dealing with the enormous sensory stimulus available in the neocortex. Inspired by visual attention models in computational neuroscience and by the need for deep generative models to learn on object-centric data, we describe a framework for generative learning using attentional mechanisms. Attentional mechanism propagate signals from region-of-interest in a scene to higher layer areas of canonical representation, where generative modeling takes place. By ignoring background clutter, generative model can concentrate its resources to model objects of interest. Our model is a proper graphical model where the 2D similarity transformation from computer vision is part of the top-down process. A ConvNet is used to initialize good guesses during posterior inference, which is based on Hamiltonian Monte Carlo. Upon learning on face images, we demonstrate that our model can robustly attend to face regions of novel test subjects. Most importantly, our model can learn generative models of new faces from a novel dataset of large images where the location of the face is not known.