Active Learning for Iterative Offline Reinforcement Learning

Lan Zhang; Luigi Franco Tedesco; Pankaj Rajak; Youcef Zemmouri; Hakan Brunzell

Active Learning for Iterative Offline Reinforcement Learning

Lan Zhang, Luigi Franco Tedesco, Pankaj Rajak, Youcef Zemmouri, Hakan Brunzell

Published: 27 Oct 2023, Last Modified: 22 Dec 2023RealML-2023EveryoneRevisionsBibTeX

Keywords: Offline Reinforcement Learning, Active Learning

TL;DR: We devised a methodology to iteratively learn improving policies through offline RL, to avoid out-of-distribution state-actions, return-based sampling, to cope with diverse offline data, and active learning, to probe high-uncertainty state-actions.

Abstract: Offline Reinforcement Learning (RL) has emerged as a promising approach to address real-world challenges where online interactions with the environment are limited, risky, or costly. Although, recent advancements produce high quality policies from offline data, currently, there is no systematic methodology to continue to improve them without resorting to online fine-tuning. This paper proposes to repurpose Offline RL to produce a sequence of improving policies, namely, Iterative Offline Reinforcement Learning (IORL). To produce such sequence, IORL has to cope with imbalanced offline datasets and to perform controlled environment exploration. Specifically, we introduce ”Return-based Sampling” as means to selectively prioritize experience from high-return trajectories and active learning driven ”Dataset Uncertainty Sampling” to probe state-actions inversely proportional to density in the dataset.We demonstrate that our proposed approach produces policies that achieve monotonically increasing average returns, from 65.4 to 140.2, in the Atari environment.

Submission Number: 47

Loading