Keywords: reinforcement learning, truncated gaussian distribution, policy distribution bias, continuous control
Abstract: In continuous domains, Proximal Policy Optimization (PPO) generally clips excessive actions to given boundaries.
However, the unbounded support of a Gaussian policy can introduce a bias toward sampling boundary actions.
This bias significantly limits the effective exploration range of action selections, risking suboptimal behaviors.
In this paper, we introduce a truncated Gaussian as an alternative policy distribution to mitigate this bias, showing how the choice of distribution affects exploration in action selection.
However, we find that a plain truncated Gaussian policy introduces the opposite bias, favoring interior actions.
To balance this bias, we ultimately propose a scale-adjusted truncated Gaussian policy, where the distribution scale shrinks when the mean is near the boundaries.
This property makes boundary actions more deterministic than in a plain truncated Gaussian, but still less so than in the original Gaussian.
Empirical studies across various continuous control tasks demonstrate that truncated Gaussian policies significantly reduce boundary action usage, while scale-adjusted variants effectively balance the bias and counter-bias.
These methods generally outperform Gaussian policies and achieve competitive performance compared to other bias-mitigation approaches.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 16
Loading