PID Accelerated Temporal Difference Algorithms

Mark Bedaywi; Amin Rakhsha; Amir-massoud Farahmand

PID Accelerated Temporal Difference Algorithms

Mark Bedaywi, Amin Rakhsha, Amir-massoud Farahmand

Published: 15 May 2024, Last Modified: 14 Nov 2024RLC 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: temporal difference learning, accelerated planning, PID controller

TL;DR: We used control theory to accelerate TD Learning and Q-Learning

Abstract: Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID~VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.

Submission Number: 270

Loading