Decoupling Dynamics and Reward for Transfer LearningDownload PDF

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone
Abstract: Reinforcement Learning (RL) provides a sound decision-theoretic framework to optimize the behavior of learning agents in an interactive setting. However, one of the limitations to applications of RLto real-world tasks is the amount of data required for learning an optimal policy. Our goal is to design an RL model that can be efficiently trained on new tasks, and produce solutions that generalize well beyond the training environment. We take inspiration from Successor Features (Dayan, 1993), which decouples the value function representation into dynamics and rewards, and learns them separately. We take this further by explicitly decoupling learning the state representation, reward function, forward dynamics, and inverse dynamics of the environment. We posit that we can learn a representation space \mathcal{Z} via this decoupling that makes downstream learning easier as: (1) the modules can be learned separately enabling efficient reuse of common knowledge across tasks to quickly adapt to new tasks; (2) the modules can be optimized jointly leading to a representation space that is adapted to the policy and value function, rather than only the observation space; (3) the dynamics model enables forward search and planning, in the usual model-based RL way. Our approach is the first model-based RL method to explicitly incorporate learning of inverse dynamics, and we show that this plays an important role in stabilizing learning
Keywords: Reinforcement Learning, Transfer Learning, Representation Learning
TL;DR: Decoupling training dynamics model and reward models to facilitate transfer
6 Replies