Event Certifications: rl-conference.cc/RLC/2024/Journal_Track
Abstract: Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms of exploration
such as the additive action noise often used in continuous control domains. Typically,
the scaling factor of this action noise is chosen as a hyper-parameter and is kept constant
during training. In this paper, we focus on action noise in off-policy deep reinforcement
learning for continuous control. We analyze how the learned policy is impacted by the noise
type, noise scale, and impact scaling factor reduction schedule. We consider the two most
prominent types of action noise, Gaussian and Ornstein-Uhlenbeck noise, and perform a vast
experimental campaign by systematically varying the noise type and scale parameter, and
by measuring variables of interest like the expected return of the policy and the state-space
coverage during exploration. For the latter, we propose a novel state-space coverage measure
$\operatorname{X}_{\mathcal{U}\text{rel}}$ that is more robust to estimation artifacts caused by points close to the
state-space boundary than previously-proposed measures. Larger
noise scales generally increase state-space coverage. However, we found that increasing the
space coverage using a larger noise scale is often not beneficial. On the contrary, reducing
the noise scale over the training process reduces the variance and generally improves the
learning performance. We conclude that the best noise type and scale are environment
dependent, and based on our observations derive heuristic rules for guiding the choice of the
action noise as a starting point for further optimization.
Certifications: Survey Certification
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Changes are highlighted in blue.
Code: https://github.com/jkbjh/code-action_noise_in_off-policy_d-rl
Assigned Action Editor: ~Adam_M_White1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 354
Loading