Sequential Coordination of Deep Models for Learning Visual Arithmetic


Nov 07, 2017 (modified: Nov 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Achieving machine intelligence requires a smooth integration of perception and reasoning. Yet the models we have developed to date tend to specialize in one or the other; sophisticated manipulation of symbols acquired from rich perceptual spaces has so far proved elusive. Consider a visual arithmetic task, where an agent must learn to solve mathematical expressions,captured in natural conditions (e.g. hand-written, with background). We propose a two-tiered architecture for tackling this problem. At the lower level we leverage a collection of pre-trained deep perceptual models that can be used to detect and extract representations of characters in the image.At the higher level, we use reinforcement learning to learn when to apply the perceptual networks and what transformations to apply to their outputs.The resulting model is able to solve a variety of tasks in the Visual Arithmetic domain, and has several advantages over standard convolutional models, including greatly improved sample efficiency.
  • TL;DR: We use reinforcement learning to train an agent to solve a set of visual arithmetic tasks using provided pre-trained perceptual modules and transformations of internal representations created by those modules.
  • Keywords: reinforcement learning, pretrained, deep learning, perception, algorithmic