Keywords: reinforcement learning, offline reinforcement learning, nonstationary
TL;DR: We investigate Offline RL with datasets that include a gradually evolving non-stationarity.
Abstract: Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
We analyze our proposed method and show that it performs well in simple continuous control tasks and challenging, high-dimensional locomotion tasks.
We show that our method often achieves the oracle performance and performs better than baselines.
Submission Number: 287
Loading