Practical $\epsilon$-Exploring Thompson Sampling for Reinforcement Learning with Continuous Controls

Mehran Aghabozorgi; Yanshu Zhang; Ke Li

Practical $\epsilon$-Exploring Thompson Sampling for Reinforcement Learning with Continuous Controls

Mehran Aghabozorgi, Yanshu Zhang, Ke Li

18 Sept 2024 (modified: 21 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Exploration, Thompson Sampling

TL;DR: This work proposes an approach to address the challenges that has limitted the application of Thompson Sampling to the RL continuous controls.

Abstract: Balancing exploration and exploitation is crucial in reinforcement learning (RL). While Thompson Sampling (TS) is a sound and effective exploration strategy, its application to RL with high-dimensional continuous controls remains challenging. We propose Practical $\epsilon$-Exploring Thompson Sampling (PETS), a practical approach that addresses these challenges. Since the posterior over the parameters of the action-value function is intractable, we leverage Langevin Monte Carlo (LMC) for sampling. We propose an approach which maintains $n$ parallel Markov chains to mitigate the issues of nai\"{ve} application of LMC. The next step following the posterior sampling in TS involves finding the optimal action under the sampled model of the action-value function. We explore both gradient-based and gradient-free approaches to approximate the optimal action, with extensive experiments. Furthermore, to justify the use of gradient-based optimization to approximate the optimal action, we analyze the regret for TS in the RL setting with continuous controls and show that it achieves the best-known bound previously established for the discrete setting. Our empirical results demonstrate that PETS, as an exploration strategy, can be integrated with leading RL algorithms, enhancing their performance and stability on benchmark continuous control tasks.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1413

Loading