Reward Constrained Policy OptimizationDownload PDFOpen Website

2019 (modified: 24 Apr 2023)ICLR (Poster) 2019Readers: Everyone
Abstract: For complex constraints in which it is not easy to estimate the gradient, we use the discounted penalty as a guiding signal. We prove that under certain assumptions it converges to a feasible solution.
0 Replies

Loading