Decoupling Dynamics and Reward for Transfer Learning

Amy Zhang, Harsh Satija, Joelle Pineau

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: Reinforcement Learning (RL) provides a sound decision-theoretic framework to optimize the behavior of learning agents in an interactive setting. However, one of the limitations to applications of RLto real-world tasks is the amount of data required for learning an optimal policy. Our goal is to design an RL model that can be efficiently trained on new tasks, and produce solutions that generalize well beyond the training environment. We take inspiration from Successor Features (Dayan, 1993), which decouples the value function representation into dynamics and rewards, and learns them separately. We take this further by explicitly decoupling learning the state representation, reward function, forward dynamics, and inverse dynamics of the environment. We posit that we can learn a representation space \mathcal{Z} via this decoupling that makes downstream learning easier as: (1) the modules can be learned separately enabling efficient reuse of common knowledge across tasks to quickly adapt to new tasks; (2) the modules can be optimized jointly leading to a representation space that is adapted to the policy and value function, rather than only the observation space; (3) the dynamics model enables forward search and planning, in the usual model-based RL way. Our approach is the first model-based RL method to explicitly incorporate learning of inverse dynamics, and we show that this plays an important role in stabilizing learning
  • Keywords: Reinforcement Learning, Transfer Learning, Representation Learning
  • TL;DR: Decoupling training dynamics model and reward models to facilitate transfer