[Replication] A Unified Bellman Optimality Principle Combining Reward Maximization and EmpowermentDownload PDF

02 Dec 2019 (modified: 05 May 2023)NeurIPS 2019 Reproducibility Challenge Blind ReportReaders: Everyone
Abstract: Designing learning agents that gain broad competence in a self-motivated manner is a longstanding goal of reinforcement learning. Empowerment is a task-agnostic information-theoretic quantity that has recently been used to intrinsically motivate reinforcement learning agents. Leibfried et al. 2019 showed how to combine empowerment with traditional task-specific reward maximization. In this work, we replicate the main empirical results of their paper. In particular, we reproduce the main algorithm of the paper, empowered actor-critic (EAC) and compare its performance with state-of-the-art baselines: soft actor-critic (SAC), proximal policy optimization (PPO), and deep deterministic policy gradients (DDPG) on a series of continuous control tasks in the MuJoCo simulator. We find that the performance of our implementation of EAC closely follows that of the original paper. However, our empirical findings also suggest that EAC is unable to improve upon baseline actor-critic algorithms . We share our code, raw learning curves and the scripts used to produce the figures in this paper.
Track: Replicability
NeurIPS Paper Id: https://openreview.net/forum?id=rye16VSl8S
5 Replies

Loading