HR-TD: A Regularized TD Method to Avoid Over-Generalization

Ishan Durugkar; Bo Liu; Peter Stone

HR-TD: A Regularized TD Method to Avoid Over-Generalization

Ishan Durugkar, Bo Liu, Peter Stone

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Temporal Difference learning with function approximation has been widely used recently and has led to several successful results. However, compared with the original tabular-based methods, one major drawback of temporal difference learning with neural networks and other function approximators is that they tend to over-generalize across temporally successive states, resulting in slow convergence and even instability. In this work, we propose a novel TD learning method, Hadamard product Regularized TD (HR-TD), that reduces over-generalization and thus leads to faster convergence. This approach can be easily applied to both linear and nonlinear function approximators. HR-TD is evaluated on several linear and nonlinear benchmark domains, where we show improvement in learning behavior and performance.

Keywords: Reinforcement Learning, TD Learning, Deep Learning

TL;DR: A regularization technique for TD learning that avoids temporal over-generalization, especially in Deep Networks

Data: [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym)

4 Replies

Loading