Optimizing Information Bottleneck in Reinforcement Learning: A Stein Variational Approach

Pei Yingjun; Hou Xinwen; Li Jian; Lei Wang

Optimizing Information Bottleneck in Reinforcement Learning: A Stein Variational Approach

Pei Yingjun, Hou Xinwen, Li Jian, Lei Wang

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Information Bottleneck, Reinforcement Learning, Stein Variational Gradient

Abstract: The information bottleneck (IB) principle is an elegant and useful learning framework for extracting relevant information that an input feature contains about the target. The principle has been widely used in supervised and unsupervised learning. In this paper, we investigate the effectiveness of the IB framework in reinforcement learning (RL). We first derive the objective based on IB in reinforcement learning, then we analytically derive the optimal conditional distribution of the optimization problem. Following the variational information bottleneck (VIB), we provide a variational lower bound using a prior distribution. Unlike VIB, we propose to utilize the amortized Stein variational gradient method to optimize the lower bound. We incorporate this framework in two popular RL algorithms: the advantageous actor critic algorithm (A2C) and the proximal policy optimization algorithm (PPO). Our experimental results show that our framework can improve the sample efficiency of vanilla A2C and PPO. We also show that our method achieves better performance than VIB and mutual information neural estimation (MINE), two other popular approaches to optimize the information bottleneck framework in supervised learning.

One-sentence Summary: Establish Information Bottleneck framework in reinforcement learning and propose a new optimization algorithm.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=6re3kFqao

5 Replies

Loading