The guide and the explorer: smart agents for resource-limited iterated batch reinforcement learning

Albert Thomas; Balázs Kégl; Othman Gaizi; Gabriel Hurtado

The guide and the explorer: smart agents for resource-limited iterated batch reinforcement learning

Albert Thomas, Balázs Kégl, Othman Gaizi, Gabriel Hurtado

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Model-based reinforcement learning, Dyna, exploration, planning, DQN

Abstract: Iterated batch reinforcement learning (RL) is a growing subfield fueled by the demand from systems engineers for intelligent control solutions that they can apply within their technical and organizational constraints. Model-based RL (MBRL) suits this scenario well for its sample efficiency and modularity. Recent MBRL techniques combine efficient neural system models with classical planning (like model predictive control; MPC). In this paper we add two components to this classical setup. The first is a Dyna-style policy learned on the system model using model-free techniques. We call it the guide since it guides the planner. The second component is the explorer, a strategy to expand the limited knowledge of the guide during planning. Through a rigorous ablation study we show that exploration is crucial for optimal performance. We apply this approach with a DQN guide and a heating explorer to improve the state of the art of the resource-limited Acrobot benchmark system by about 10%.

One-sentence Summary: Smart agents for resource-limited iterated batch reinforcement learning

Supplementary Material: zip

11 Replies

Loading