everyone
since 28 Feb 2025">EveryoneRevisionsBibTeXCC BY 4.0
We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Optimistic-MBRL (OMBRL), an approach based on the principle of optimism in the face of uncertainty. OMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. Under common regularity assumptions on the system, we show that OMBRL has sublinear regret for nonlinear dynamics in the (i) finite-horizon, (ii) discounted infinite-horizon, and (iii) non-episodic setting. Additionally, OMBRL offers a flexible and scalable solution for principled exploration. We evaluate OMBRL on state-based and visual-control environments, where it displays favorable performance across all tasks and baselines. In hardware experiments on a dynamic RC car, OMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.