Keywords: temporal difference learning, accelerated planning, PID controller
TL;DR: We used control theory to accelerate TD Learning and Q-Learning
Abstract: Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID~VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning.
We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.
Submission Number: 270
Loading