Efficient and Stable Off-policy Training via Behavior-aware Evolutionary LearningDownload PDF

16 Jun 2022, 10:45 (modified: 07 Nov 2022, 03:27)CoRL 2022 PosterReaders: Everyone
Student First Author: yes
Keywords: Continuos control, Reinforcement learning, Evolution strategies
TL;DR: An evolutionary training framework for off-policy reinforcement learning inspired by evolution strategies (ES) called Behavior-aware Evolutionary Learning (BEL).
Abstract: Applying reinforcement learning (RL) algorithms to real-world continuos control problems faces many challenges in terms of sample efficiency, stability and exploration. Off-policy RL algorithms show great sample efficiency but can be unstable to train and require effective exploration techniques for sparse reward environments. A simple yet effective approach to address these challenges is to train a population of policies and ensemble them in certain ways. In this work, a novel population based evolutionary training framework inspired by evolution strategies (ES) called Behavior-aware Evolutionary Learning (BEL) is proposed. The main idea is to train a population of behaviorally diverse policies in parallel and conduct selection with simple linear recombination. BEL consists of two mechanisms called behavior-regularized perturbation (BRP) and behavior-targeted training (BTT) to accomplish stable and fine control of the population behavior divergence. Experimental studies showed that BEL not only has superior sample efficiency and stability compared to existing methods, but can also produce diverse agents in sparse reward environments. Due to the parallel implementation, BEL also exhibits relatively good computation efficiency, making it a practical and competitive method to train policies for real-world robots.
Supplementary Material: zip
Code: https://github.com/raymond-myc/BEL
13 Replies