Diverse Exploration via InfoMax Options

Yuji Kanagawa; Tomoyuki Kaneko

Diverse Exploration via InfoMax Options

Yuji Kanagawa, Tomoyuki Kaneko

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Hierachical Reinforcement Learning, Exploration

Abstract: In this paper, we study the problem of autonomously discovering temporally abstracted actions, or options, for exploration in reinforcement learning. For learning diverse options suitable for exploration, we introduce the infomax termination objective defined as the mutual information between options and their corresponding state transitions. We derive a scalable optimization scheme for maximizing this objective via the termination condition of options, yielding the InfoMax Option Critic (IMOC) algorithm. Through illustrative experiments, we empirically show that IMOC learns diverse options and utilizes them for exploration. Moreover, we show that IMOC scales well to continuous control tasks.

One-sentence Summary: For exploration in RL, we propose a method for learning diverse options end-to-end.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/diverse-exploration-via-infomax-options/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=tgMSzz3Ilq

14 Replies

Loading