Variational Diversity Maximization for Hierarchical Skill Discovery

Yingnan Zhao, Peng Liu, Wei Zhao, Xianglong Tang

Published: 01 Jan 2023, Last Modified: 14 May 2023Neural Process. Lett. 2023Readers: Everyone

Abstract: Hierarchical Reinforcement Learning (HRL) has led to rapid progress on structured exploration and solving challenging tasks. In HRL, planning with skills instead of actions, which effectively shortens the task’s length and decreases the complexity of the problems. Most works on skill discovery focus on finding diverse skills. However, existing methods fail to increase the diversity of the visited states when the agent performs skills. In this paper, "Variational Diversity Maximization" (VIM) is proposed to address this problem. VIM encourages the agent to maximize an information theoretic objective: The entropy of states conditioned on skills. Hence the agent explores more about the environment when performing skills, increasing the possibility of finding the optimal policy. Maximizing the proposed conditional entropy is not a trivial problem. VIM approximates it through the reconstruction error of conditional variational autoencoders, hence this problem is solved elegantly. Besides this entropy, the mutual information between states and skills is also maximized to discover diverse skills, as other methods. Furthermore, a novel method is proposed to measure the diversity of skills efficiently. Experimental results suggest that VIM allows the agent to learn exploratory skills in an unsupervised way, and the agent achieves strong performance on the challenging tasks with these learned skills. Moreover, the proposed method can be easily combined with other planning algorithms to solve complicated tasks.

0 Replies