everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Representation learning and unsupervised skill discovery remain key challenges for training reinforcement learning agents. We show that the empowerment objective, which measures the maximum number of distinct skills an agent can execute from some representation, enables agents to simultaneously perform representation learning and unsupervised skill discovery. We provide theoretical analysis that empowerment can help agents learn sufficient statistic representations of observations because the maximum number of distinct skills an agent can execute from a learned representation grows when that representation does not combine multiple observations associated with different sufficient statistics. To jointly learn representations and skills, we use a new approach to mutual information maximization that uses bandit reinforcement learning. Under this approach, the agent learns a bandit policy that maps the skill starting representation to a vector that contains the set of parameters that make up the skill-conditioned policy. The reward for a skill-conditioned policy action is the variational lower bound on mutual information conditioned on that policy, which measures the diversity of the skill-conditioned policy action. Empirically, we demonstrate that our approach can (i) learn significantly more skills than existing unsupervised skill discovery approaches and (ii) learn a representation suitable for downstream reinforcement learning applications.