Self-Supervised State-Control through Intrinsic Mutual Information RewardsDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: Intrinsic Reward, Deep Reinforcement Learning, Skill Discovery, Mutual Information, Self-Supervised Learning, Unsupervised Learning
  • TL;DR: This paper introduces Mutual Information-based State-Control, a self-supervised reinforcement learning framework for discovering robotic manipulation skills.
  • Abstract: Learning to discover useful skills without a manually-designed reward function would have many applications, yet is still a challenge for reinforcement learning. In this paper, we propose Mutual Information-based State-Control (MISC), a new self-supervised Reinforcement Learning approach for learning to control states of interest without any external reward function. We formulate the intrinsic objective as rewarding the skills that maximize the mutual information between the context states and the states of interest. For example, in robotic manipulation tasks, the context states are the robot states and the states of interest are the states of an object. We evaluate our approach for different simulated robotic manipulation tasks from OpenAI Gym. We show that our method is able to learn to manipulate the object, such as pushing and picking up, purely based on the intrinsic mutual information rewards. Furthermore, the pre-trained policy and mutual information discriminator can be used to accelerate learning to achieve high task rewards. Our results show that the mutual information between the context states and the states of interest can be an effective ingredient for overcoming challenges in robotic manipulation tasks with sparse rewards. A video showing experimental results is available at
  • Code:
8 Replies