Reward Constrained Policy Optimization

Chen Tessler, Daniel J. Mankowitz, Shie Mannor

2019 (modified: 24 Apr 2023)ICLR (Poster) 2019Readers: Everyone

Abstract: For complex constraints in which it is not easy to estimate the gradient, we use the discounted penalty as a guiding signal. We prove that under certain assumptions it converges to a feasible solution.

0 Replies