Value Iteration Algorithm for Optimal Consensus Control of Multi-agent Systems

Qichao Zhang, Dongbin Zhao

2018 (modified: 07 Nov 2022)ICONIP (7) 2018Readers: Everyone

Abstract: In this paper, we investigate the optimal consensus control problem for the multi-agent systems by utilizing the Heuristic Dynamic Programming (HDP) algorithm under the centralized learning and decentralized execution framework, which is a kind of value iteration algorithms in reinforcement learning. Different from independent learning framework, a centralized value function which is shared for all the agents is defined. To approach the Nash equilibrium, we prove the equivalence relationship between the Bellman optimality equation and the discrete-time Hamilton-Jacobi-Bellman (DTHJB) equation. For the implementation purpose, the actor-critic structure with NN approximators is proposed to approach the solution of DTHJB equation, where the critic network for all the agents is centralized using the global information, and each actor network for the corresponding agent is decentralized using the local information. Finally, the simulation results are provided, which demonstrates the effectiveness of the proposed HDP algorithm under the centralized learning and decentralized execution framework.

0 Replies