Task-driven Discovery of Perceptual Schemas for Generalization in Reinforcement Learning

Wilka Torrico Carvalho; Andrew Kyle Lampinen; Kyriacos Nikiforou; Felix Hill; Murray Shanahan

Task-driven Discovery of Perceptual Schemas for Generalization in Reinforcement Learning

Wilka Torrico Carvalho, Andrew Kyle Lampinen, Kyriacos Nikiforou, Felix Hill, Murray Shanahan

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: deep reinforcement learning, reinforcement learning, deep learning, compositional generalization, generalization, recurrent architecture

Abstract: Deep reinforcement learning (Deep RL) has recently seen significant progress in developing algorithms for generalization. However, most algorithms target a single type of generalization setting. In this work, we study generalization across three disparate task structures: (a) tasks composed of spatial and temporal compositions of regularly occurring object motions; (b) tasks composed of active perception of and navigation towards regularly occurring 3D objects; and (c) tasks composed of navigating through sequences of regularly occurring object-configurations. These diverse task structures all share an underlying idea of compositionality: task completion always involves combining reoccurring segments of task-oriented perception and behavior. We hypothesize that an agent can generalize within a task structure if it can discover representations that capture these reoccurring task-segments. For our tasks, this corresponds to representations for recognizing individual object motions, for navigation towards 3D objects, and for navigating through object-configurations. Taking inspiration from cognitive science, we term representations for reoccurring segments of an agent's experience, "perceptual schemas". We propose Composable Perceptual Schemas (CPS), which learns a composable state representation where perceptual schemas are distributed across multiple, relatively small recurrent "subschema" modules. Our main technical novelty is an expressive attention function that enables subschemas to dynamically attend to features shared across all positions in the agent's observation. Our experiments indicate our feature-attention mechanism enables CPS to generalize better than recurrent architectures that attend to observations with spatial attention.

One-sentence Summary: We develop a modular and composable architecture that leverages dynamic feature attention to generalize across diverse environments.

Supplementary Material: zip

14 Replies

Loading