Keywords: offline reinforcement learning, generalization
Abstract: In this work, we study offline reinforcement learning (RL) with zero-shot generalization property (ZSG), where the agent has access to an offline dataset including experiences from different environments, and the goal of the agent is to train a policy over the training environments which performs well on test environments without further interaction. Existing work showed that classical offline RL fails to generalize to new, unseen environments. To address such an issue, we propose new offline RL frameworks with ZSG, based on empirical risk minimization or proximal policy optimization. We prove that our frameworks find the near-optimal policy with ZSG both theoretically and empirically, from general environments to specific settings such as linear Markov decision processes (MDPs). Our result serves as a first step in understanding the foundation of the generalization phenomenon in offline reinforcement learning.
Submission Number: 33
Loading