Inherent Exploration via Sampling for Stochastic Policies

Zhenpeng Shi; Chi Xu; Huaze Tang; Wenbo Ding

Inherent Exploration via Sampling for Stochastic Policies

Zhenpeng Shi, Chi Xu, Huaze Tang, Wenbo Ding

Published: 06 Mar 2025, Last Modified: 24 Apr 2025FPI-ICLR2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: online reinforcement learning, exploration, sampling, quasi-monte carlo

TL;DR: We proposed a novel exploration strategy for reinforcement learning in continuous action space by controlling the sampling strategy for stochastic policies.

Abstract: In this paper, we propose a novel exploration strategy for reinforcement learning in continuous action spaces by controlling the sampling strategy of stochastic policies. The proposed method, Inherent Exploration via Sampling (IES), enhances exploration by diversifying actions through the selection of varied Gaussian inputs. IES leverages the inherent stochasticity of policies to improve exploration without relying on external bonuses. Furthermore, it integrates seamlessly with existing exploration methods, introducing negligible computational overhead. Theoretically, we prove that IES achieves $\mathcal{O}\left(\epsilon^{-3}\right)$ sample complexity under the actor-critic framework in continuous action spaces. Experimentally, we evaluate IES on Gaussian policies (e.g., Soft Actor-Critic, Proximal Policy Optimization) and consistency-based policies for continuous control benchmarks mujoco, dm\_control and isaacgym. The results demonstrate that IES effectively enhances the exploration capabilities of different policies, thereby improving the convergence of various reinforcement learning algorithms.

Submission Number: 54

Loading