Exploring Exploration: A Comparative Analysis of Colored Noise Strategies in Reinforcement Learning

TMLR Paper2255 Authors

17 Feb 2024 (modified: 07 May 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: Reinforcement Learning algorithms, in general, and off-policy agents navigating continuous control spaces, in particular, often induce exploration through the addition of noise into their action selection process. Popular implementations majorly utilize uncorrelated Gaussian (white) noise, or temporally correlated Ornstein-Uhlenbeck (OU) noise, which is closely related to red noise. Recent works propose using pink noise, which is halfway between white and OU noise, as the default action noise type. They claim pink noise to be a better default than noise schedulers, which are algorithms that vary the level of temporal correlation as learning progresses. In this paper, we attempt to verify their claims and present an analysis of colored noise exploration, comparing various strategies of noise integration. We further attempt to identify the effect of using spatially and temporally correlated noise to achieve exploration. The code and samples are present in the supplementary material.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Pablo_Sprechmann1
Submission Number: 2255
Loading