Abstract: Constrained reinforcement learning (RL) algorithms have attracted extensive attentions nowadays to tackle sequential decision-making problems that contain constraints defined under various risk measures. However, most works only search policies within the stationary policy class and fail to capture a simple intuition: adjust the action-selecting distribution at each state according to the accumulated cost so far. In this work, we design a novel quantile-level-driven policy class to fully realize such intuition, within which each policy additionally takes the quantile level of the accumulated cost as input. Such quantile level is obtained via a novel Invertible Backward Distributional Critic (IBDC) framework, which utilizes invertible function approximators to estimate the accumulated cost distribution and outputs the required quantile level with their inverse forms. Further, the estimated accumulated cost distribution also helps to decompose the challenging trajectory-level constraints into state-level constraints, and Risk-Aware Constrained RL (RAC) algorithm is designed then to solve the decomposed problem with Lagrangian multipliers. Experimental results in various environments validate the effectiveness of RAC versus state-of-the-art baselines.
Loading