Covariance Matrix Evolutionary Preference-based Policy Search for Robot Confrontation

Chenheng Zhang, Chuxi Xiao, Xian Guo

Published: 01 Jan 2022, Last Modified: 15 May 2025ACIRS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: For robot confrontation task, reward derives only at the end, which makes the optimal policy learning very hard. Conventional deep reinforcement learning such as Deep Deterministic Policy Search (DDPG) performs poorly for sparse reward problems. To deal with this problem, in this paper, a novel approach, namely Covariance Matrix Evolutionary Preferencebased Policy Search (CMA-EPPS) is proposed. Specifically, the robot confrontation task is formulated as a preference-based task firstly. Then an evolutionary preference-based reinforcement learning policy search approach is developed. To improve the searching efficiency, covariance matrix adaptation is introduced, and to improve the optimization, a novel ranking method and a preference judging approach are also integrated. Finally, the proposed method is used in the robot confrontation environment and results show that CMA-EPPS can outperform the conventional methods such as DDPG.