Policy Search by Dynamic ProgrammingDownload PDFOpen Website

2003 (modified: 11 Nov 2022)NIPS 2003Readers: Everyone
Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.
0 Replies

Loading