Abstract: Rapid adaptation to the environment is the long-term task of reinforcement learning. However, reinforcement learning faces great challenges in dynamic environments, especially with continuous state–action spaces. In this article, we propose a systematic incremental reinforcement learning method via performance evaluation and policy perturbation (IRL-PEPP) to improve the adaptability of reinforcement learning algorithms in dynamic environments with continuous state–action spaces, which mainly includes three parts, i.e., performance evaluation, policy perturbation, and importance weighting. First, in performance evaluation, we apply the learned optimal policy to sample a few episodes in the original environment and use these samples to evaluate the policy applicability in the new environment. Then, in policy perturbation, the policy is perturbed according to the policy applicability to balance the tradeoff between exploration and exploitation in the new environment. Finally, importance weighting is applied to weight the information to speed up the adjustment process of the policy. Experimental results demonstrate the feasibility and efficiency of the proposed IRL-PEPP method for continuous control tasks in comparison with the existing state-of-the-art methods.
Loading