Representation Balancing Offline Model-based Reinforcement Learning

Byung-Jun Lee; Jongmin Lee; Kee-Eung Kim

Representation Balancing Offline Model-based Reinforcement Learning

Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim

Published: 12 Jan 2021, Last Modified: 05 May 2023ICLR 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning, Model-based Reinforcement Learning, Offline Reinforcement Learning, Batch Reinforcement Learning, Off-policy policy evaluation

Abstract: One of the main challenges in offline and off-policy reinforcement learning is to cope with the distribution shift that arises from the mismatch between the target policy and the data collection policy. In this paper, we focus on a model-based approach, particularly on learning the representation for a robust model of the environment under the distribution shift, which has been first studied by Representation Balancing MDP (RepBM). Although this prior work has shown promising results, there are a number of shortcomings that still hinder its applicability to practical tasks. In particular, we address the curse of horizon exhibited by RepBM, rejecting most of the pre-collected data in long-term tasks. We present a new objective for model learning motivated by recent advances in the estimation of stationary distribution corrections. This effectively overcomes the aforementioned limitation of RepBM, as well as naturally extending to continuous action spaces and stochastic policies. We also present an offline model-based policy optimization using this new objective, yielding the state-of-the-art performance in a representative set of benchmark offline RL tasks.

One-sentence Summary: We present RepB-SDE, a framework for balancing the model representation with stationary distribution estimation, aiming at obtaining a model robust to the distribution shift that arises in off-policy and offline RL.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Data: [D4RL](https://paperswithcode.com/dataset/d4rl), [MuJoCo](https://paperswithcode.com/dataset/mujoco)

15 Replies

Loading