Self-Supervised Policy Adaptation

Christopher Mutschler; Sebastian Pokutta

Self-Supervised Policy Adaptation

Christopher Mutschler, Sebastian Pokutta

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Greedy State Representation Learning (GSRL) translates a given policy when the environment representation changes

Abstract: We consider the problem of adapting an existing policy when the environment representation changes. Upon a change of the encoding of the observations the agent can no longer make use of its policy as it cannot correctly interpret the new observations. This paper proposes Greedy State Representation Learning (GSRL) to transfer the original policy by translating the environment representation back into its original encoding. To achieve this GSRL samples observations from both the environment and a dynamics model trained from prior experience. This generates pairs of state encodings, i.e., a new representation from the environment and a (biased) old representation from the forward model, that allow us to bootstrap a neural network model for state translation. Although early translations are unsatisfactory (as expected), the agent eventually learns a valid translation as it minimizes the error between expected and observed environment dynamics. Our experiments show the efficiency of our approach and that it translates the policy in considerably less steps than it would take to retrain the policy.

Keywords: reinforcement learning, environment representation, representation learning, model mismatch

Original Pdf: pdf

5 Replies

Loading