Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Jasmine Bayrooti; Carl Henrik Ek; Amanda Prorok

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Jasmine Bayrooti, Carl Henrik Ek, Amanda Prorok

Published: 22 Jan 2025, Last Modified: 01 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, model-based reinforcement learning, optimistic exploration

TL;DR: We introduce a theoretically-grounded approach to optimistic exploration that leverages joint uncertainty over states and rewards for improved sample efficiency.

Abstract: Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our approach is the first that allows for reasoning about _joint_ uncertainty over transitions and rewards for optimistic exploration. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12371

Loading