Bounded Active Exploration for Model-Based Reinforcement Learning

Ting Qiao, Henry Williams, Bruce A. MacDonald

Published: 2025, Last Modified: 24 Mar 2026CASE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: A precise world model is imperative for the performance of Model-Based Reinforcement Learning (MBRL). Active exploration enhances world models via repeatedly visiting uncertain regions where the world model lacks proficiency. However, this strategy may introduce an objective mismatch between maximising rewards and developing an accurate world model. In response to this challenge, we propose a novel exploration strategy, termed bounded active exploration (BAE), that confines exploration behaviours within action candidates derived from a soft reward-exploitation policy. As the policy becomes ‘confident’, these candidates converge on one single decisive action. We evaluate BAE with algorithms from two disparate MBRL research streams on simulation and real-world tasks. The empirical results manifest the superiority of our novel exploration strategy in most simulation tasks. BAE not only elevates MBRL agents’ data efficiency but also provides an alternative method for applying intrinsic motivations in Reinforcement Learning.

External IDs:dblp:conf/case/QiaoWM25