SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning

Khurram Javed; Arsalan Sharifnassab; Richard S. Sutton

SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning

Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton

Published: 15 May 2024, Last Modified: 14 Nov 2024RLC 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Online learning, Meta-learning, Step-size optimization, Temporal difference learning

TL;DR: A fast and robust TD learning algorithm

Abstract: Learning to make temporal predictions is a key component of reinforcement learning algorithms. The dominant paradigm for learning predictions from an online stream of data is Temporal Difference (TD) learning. In this work we introduce a new TD algorithm---SwiftTD---that learns more accurate predictions than existing algorithms. SwiftTD combines True Online TD($\lambda$) with per-feature step-size parameters, step-size optimization, a bound on the update to the eligibility vector, and step-size decay. Per-feature step-size parameters and step-size optimization improve credit assignment by increasing the step-size parameters of important signals and reducing them for irrelevant signals. The bound on the update to the eligibility vector prevents overcorrections. Step-size decay reduces step-size parameters if they are too large. We benchmark SwiftTD on the Atari Prediction Benchmark and show that even with linear function approximation it can learn accurate predictions. We further show that SwiftTD performs well across a wide range of its hyperparameters. Finally, we show that SwiftTD can be used in the last layer of neural networks to improve their performance.

Submission Number: 111

Loading