Information-Directed Policy Search in Sparse-Reward Settings via the Occupancy Information RatioDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 12 May 2023CISS 2023Readers: Everyone
Abstract: This paper examines a new measure of the exploration/exploitation trade-off in reinforcement learning (RL) called the occupancy information ratio (OIR). To this end, the paper derives the Information-Directed Actor-Critic (IDAC) algorithm for solving the OIR problem, provides an overview of the rich theory underlying IDAC and related OIR policy gradient methods, and experimentally investigates the advantages of such methods. The central contribution of this paper is to provide empirical evidence that, due to the form of the OIR objective, IDAC enjoys superior performance over vanilla RL methods in sparse-reward environments.
0 Replies

Loading