Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
TD Learning with Neural Networks - Study of the Leakage Propagation Problem
Hugo Penedones, Damien Vincent, Timothy Mann, Sylvain Gelly
Feb 12, 2018 (modified: Feb 12, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:In On-Policy Evaluation, one estimates the value function of the data-generating policy with algorithms like Monte-Carlo regression (MC) or Temporal-Difference Learning (TD). We investigate the issue of poor estimation when using a function approximator like a neural network, due to limited data, limited capacity or training process, and how approximation errors can be further propagated by TD bootstrap updates. We suggest that this problem may be mitigated by first learning (unsupervisedly) a representation that separates states that look similar, but are actually quite distant when one looks at the trajectories followed by the policy.
TL;DR:TD Learning with neural networks has leakage problems that may be partially mitigated by unsupervised learning