Adaptive Reward Penalty in Safe Reinforcement Learning

03 Feb 2023 (modified: 02 May 2023)Submitted to Blogposts @ ICLR 2023Readers: Everyone
Keywords: Safe RL, RCPO, PPO
Abstract: In this blog, we dive into the ICLR 2019 paper "Reward Constrained Policy Optimization" (RCPO) by Tessler et al. and highlight the importance of adaptive reward shaping in safe reinforcement learning. We reproduce the paper's experimental results by implementing RCPO into Proximal Policy Optimization (PPO). This blog aims to provide researchers and practitioners with (1) a better understanding of safe reinforcement learning in terms of constrained optimization and (2) how penalized reward functions can be effectively used to train a robust policy.
Blogpost Url: https://iclr-blogposts.github.io/staging/blog/2023/Adaptive-Reward-Penalty-in-Safe-Reinforcement-Learning/
ICLR Papers: https://arxiv.org/abs/1805.11074
ID Of The Authors Of The ICLR Paper: ~Boris_Meinardus1
Conflict Of Interest: No
5 Replies

Loading