Abstract: Colored noise, a class of temporally correlated noise processes, has
shown promising results for improving exploration in deep
reinforcement learning for both off-policy and on-policy
algorithms. However, It is unclear how temporally correlated colored
noise affects policy learning apart from changing exploration
properties. In this paper, we investigate the influence of colored
noise on the optimal policy in a simplified linear quadratic regulator
(LQR) setting. We show that the expected
trajectory remains independent of the noise color for a given linear policy. We derive a closed-form solution for the expected cost and find that the noise affects
both the expected cost and the optimal policy. The cost splits
into two parts: a state-cost part equaling the cost for the
unperturbed system and a noise-cost term independent of the initial
state. Far from the goal state, the state cost dominates, and the
effect due to the noise is negligible: the policy approaches the
optimal policy of the unperturbed system. Near the goal state, the noise cost dominates, changing the optimal policy.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: No
Submission Number: 66
Loading