Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environmentsDownload PDF

31 Mar 2022OpenReview Archive Direct UploadReaders: Everyone
Abstract: Neural Rewards Regression (NRR) is a generalisation of Temporal Difference Learning (TD-Learning) and Approximate Q-Iteration with Neural Networks. The method allows to trade between these two techniques as well as between approaching the fixed point of the Bellman iteration and minimising the Bellman residual. NRR explicitly finds a near-optimal Q-function without an algorithmic framework except Back Propagation for Neural Networks. We further extend the approach by a recurrent substructure to Recurrent Neural Rewards Regression (RNRR) for partial observable environments or higher order Markov Decision Processes. It allows to transport past information to the present and the future in order to reconstruct the Markov property.
0 Replies

Loading