Model parameter adaptive instance-based policy optimization for episodic control tasks of nonholonomic systems

Kyotaro Ohashi, Natsuki Fujiyoshi, Naoki Sakamoto, Youhei Akimoto

2018 (modified: 03 Nov 2022)GECCO (Companion) 2018Readers: Everyone

Abstract: Evolutionary Computation (EC) attracts more and more attention in Reinforcement Learning (RL) with successful applications such as robot control. Instance-Based Policy (IBP) is a promising alternative to policy representations based on Artificial Neural Networks (ANNs). The IBP has been reported superior to continuous policy representations such as ANNs in the stabilization control of non-holonomic systems due to its nature of bang-bang type control, and its understandability. A difficulty in applying an EC based policy optimization to an RL task is to choose appropriate hyper-parameters such as the network structure in ANNs and the parameters of EC. The same applies to the IBP, where the critical parameter is the number of instances that determines mode flexibility. In this paper, we propose a novel RL method combining the IBP representation and optimization by the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is a state-of-the-art general-purpose search algorithm for black-box continuous optimization. The proposed method, called IBP-CMA, is a direct policy search that adapts the number of instances during the learning process and activates instances that do not contribute to the output. In the simulation, the IBP-CMA is compared with an ANN-based RL, CMA-TWEANN.

0 Replies