Abstract: Proximal Policy Optimization (PPO), a popular on-policy deep
reinforcement learning method, employs a stochastic policy for
exploration. In this paper, we propose a colored noise-based
stochastic policy variant of PPO. Previous research highlighted the
importance of temporal correlation in action noise for effective
exploration in off-policy reinforcement learning. Building on
this, we investigate whether correlated noise can also enhance
exploration in on-policy methods like PPO. We discovered that
correlated noise for action selection improves learning performance
and outperforms the currently popular uncorrelated white noise
approach in on-policy methods. Unlike off-policy learning, where pink
noise was found to be highly effective, we found that a colored
noise, intermediate between white and pink, performed best for
on-policy learning in PPO. We examined the impact of varying the amount
of data collected for each update by modifying the number of parallel
simulation environments for data collection and observed that
a larger number of parallel environments benefits from more
correlated noise. However, overall, we found four parallel
environments to work best. Due to the significant impact and ease of
implementation, we recommend switching to correlated noise as the
default noise source in PPO.
1 Reply
Loading