TD Learning with Neural Networks - Study of the Leakage Propagation Problem

Hugo Penedones; Damien Vincent; Timothy Mann; Sylvain Gelly

TD Learning with Neural Networks - Study of the Leakage Propagation Problem

Hugo Penedones, Damien Vincent, Timothy Mann, Sylvain Gelly

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone

Abstract: In On-Policy Evaluation, one estimates the value function of the data-generating policy with algorithms like Monte-Carlo regression (MC) or Temporal-Difference Learning (TD). We investigate the issue of poor estimation when using a function approximator like a neural network, due to limited data, limited capacity or training process, and how approximation errors can be further propagated by TD bootstrap updates. We suggest that this problem may be mitigated by first learning (unsupervisedly) a representation that separates states that look similar, but are actually quite distant when one looks at the trajectories followed by the policy.

Keywords: policy evaluation, temporal difference learning, unsupervised learning, neural networks, machine learning

TL;DR: TD Learning with neural networks has leakage problems that may be partially mitigated by unsupervised learning

8 Replies

Loading