Centralized Cooperative Exploration Policy for Continuous Control Tasks

Chao Li, Chen Gong, Qiang He, Xinwen Hou, Yu Liu

Published: 2023, Last Modified: 10 Jan 2025AAMAS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite recent works making great progress in continuous control tasks, exploration in these tasks has remained insufficiently investigated. This paper proposes CCEP (C entralized C ooperative E xploration P olicy), which utilizes estimation biases of value functions to contribute to the exploration capacity. CCEP keeps two value functions initialized with different parameters, and generates diverse policies with multiple exploration styles from a pair of value functions. In addition, a centralized policy framework ensures that CCEP achieves message delivery between multiple policies, furthermore contributing to exploring the environment cooperatively. Extensive experimental results demonstrate that CCEP achieves higher exploration capacity. Empirical analysis shows diverse exploration styles in the learned policies by CCEP, reaping benefits in more exploration regions. Besides, the exploration capabilities of CCEP have been demonstrated to outperform current state-of-the-art methods on multiple continuous control tasks.