Optimism via Intrinsic Rewards: Scalable and Principled Exploration for Model-based Reinforcement Learning

Bhavya Sukhija; Lenart Treven; Carmelo Sferrazza; Florian Dorfler; Pieter Abbeel; Andreas Krause

Optimism via Intrinsic Rewards: Scalable and Principled Exploration for Model-based Reinforcement Learning

Bhavya Sukhija, Lenart Treven, Carmelo Sferrazza, Florian Dorfler, Pieter Abbeel, Andreas Krause

Published: 28 Feb 2025, Last Modified: 27 Mar 2025WRL@ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: full paper

Keywords: Reinforcement Learning (RL) Theory, Deep RL, Regret bounds for RL, Robotics

TL;DR: Simple, scalable and efficient RL algorithm with regret bounds for general RL settings, state-based, visual control and hardware experiments.

Abstract: We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose **O**ptimistic-**MBRL** (OMBRL), an approach based on the principle of optimism in the face of uncertainty. OMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. Under common regularity assumptions on the system, we show that OMBRL has sublinear regret for nonlinear dynamics in the (i) finite-horizon, (ii) discounted infinite-horizon, and (iii) non-episodic setting. Additionally, OMBRL offers a flexible and scalable solution for principled exploration. We evaluate OMBRL on state-based and visual-control environments, where it displays favorable performance across all tasks and baselines. In hardware experiments on a dynamic RC car, OMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.

Supplementary Material: zip

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Presenter: ~Bhavya_Sukhija1

Format: Yes, the presenting author will definitely attend in person because they are attending ICLR for other complementary reasons.

Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding availability would significantly influence their ability to attend the workshop in person.

Submission Number: 39

Loading