Pink Noise LQR: How does Colored Noise affect the Optimal Policy in RL?

Published: 17 Jun 2024, Last Modified: 01 Jul 2024FoRLaC PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Colored noise, a class of temporally correlated noise processes, has shown promising results for improving exploration in deep reinforcement learning for both off-policy and on-policy algorithms. However, It is unclear how temporally correlated colored noise affects policy learning apart from changing exploration properties. In this paper, we investigate the influence of colored noise on the optimal policy in a simplified linear quadratic regulator (LQR) setting. We show that the expected trajectory remains independent of the noise color for a given linear policy. We derive a closed-form solution for the expected cost and find that the noise affects both the expected cost and the optimal policy. The cost splits into two parts: a state-cost part equaling the cost for the unperturbed system and a noise-cost term independent of the initial state. Far from the goal state, the state cost dominates, and the effect due to the noise is negligible: the policy approaches the optimal policy of the unperturbed system. Near the goal state, the noise cost dominates, changing the optimal policy.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: No
Submission Number: 66
Loading