Abstract: We introduce a new unsupervised reinforcement learning method for discovering the set of intrinsic options available to an agent. This set is learned by maximizing the number of different states an agent can reliably reach, as measured by the mutual information between the set of options and option termination states. To this end, we instantiate two policy gradient based algorithms, one that creates an explicit embedding space of options and one that represents options implicitly. Both algorithms also yield a tractable and explicit empowerment measure, which is useful for empowerment maximizing agents. Furthermore, they scale well with function approximation and we demonstrate their applicability on a range of tasks.
Conflicts: google.com
Keywords: Unsupervised Learning, Deep learning
3 Replies
Loading