The impact of intrinsic rewards on exploration in Reinforcement Learning

Aya Kayal, Eduardo Pignatelli, Laura Toni

Published: 31 May 2025, Last Modified: 04 May 2026Neural Computing and Applications JournalEveryoneCC BY 4.0

Abstract: One of the open challenges in Reinforcement Learning (RL) is the hard exploration problem in sparse reward environments. Various types of intrinsic rewards have been proposed to address this challenge by pushing toward diversity. This diversity might be imposed at different levels, favoring the agent to explore different states, policies, or behaviors (State, Policy, and Skill level diversity, respectively). However, the impact of diversity on the agent’s behavior remains unclear. In this work, we aim to fill this gap by studying the effect of different levels of diversity imposed by intrinsic rewards on the exploration patterns of RL agents. We select four intrinsic rewards (State Count, Intrinsic Curiosity Module (ICM), Maximum Entropy, and Diversity is All You Need (DIAYN)), each pushing for a different diversity level. We conduct an empirical study on MiniGrid environments to compare their impact on exploration considering various metrics related to the agent’s exploration, namely: episodic return, observation coverage, agent’s position coverage, policy entropy, and timeframes to reach the sparse reward. The main outcome of the study is that State Count leads to the best exploration performance in the case of low-dimensional observations. However, in the case of RGB observations, the performance of State Count is highly degraded mainly due to representation learning challenges. Conversely, Maximum Entropy is less impacted, resulting in a more robust exploration, despite not always being optimal. Lastly, our empirical study revealed that learning diverse skills with DIAYN, often linked to improved robustness and generalization, does not promote exploration in MiniGrid environments. This is because: (i) Learning the skill space itself can be challenging, and (ii) exploration within the skill space prioritizes differentiating between behaviors rather than achieving uniform state visitation.