Shiftable Dynamic Policy Programming for Efficient and Robust Reinforcement Learning Control

Published: 2021, Last Modified: 28 Jan 2026ROBIO 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, a novel value function-based reinforcement learning (RL) approach, Shiftable Dynamic Policy Programming (SDPP), is proposed to improve the sample efficiency and robustness of RL in control problems. Extended from previous sample-efficient RL method Dynamic Policy Programming(DPP) that punishes the over-large Kullback-Leibler divergency between the updated and previous policies as a penalty term, SDPP employs a shiftable parameter to dynamically control the penalty term according to the historical learning performances and designs a general shift strategy for the shiftable parameter. Evaluated by several benchmark control tasks in OpenAI gym, based on agent’s behaviors facing various reward settings, SDPP successfully demonstrates its capability in automatically selecting a suitable smoothness of policy update and therefore achieves both faster convergence and better robustness compared with the original DPP.
Loading