Model-Free Algorithms for Cooperative Output Regulation of Discrete-Time Multiagent Systems via Q-Learning Method

Huaguang Zhang, Tianbiao Wang, Dazhong Ma, Lulu Zhang

Published: 2025, Last Modified: 13 May 2025IEEE Trans. Cybern. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This article addresses the cooperative output regulation problem for discrete-time multiagent systems with unknown parameters, a challenge that arises in many practical applications where system models are unavailable. Unlike existing techniques, a model-free Q-learning algorithm is devised to iteratively obtain the optimal policy. This algorithm operates independently of system parameters, and its immediate cost formulation excludes the necessity of solving regulator equations. Consequently, it achieves a streamlined structure, facilitating direct determination of the optimal policy. Subsequently, the stability of each iteration of the algorithm is formally established, along with the derivation of a unique condition for the Q-function matrix. Additionally, to address the challenge of obtaining a stable policy when the initial policy is unstable, an innovative data-driven algorithm is introduced that effectively computes the initial stable gains, ensuring convergence to stability throughout the learning process. Meanwhile, we focus on demonstrating that the distributed observer and the excitation noise do not introduce bias. Finally, the efficacy of the proposed algorithm is validated through two simulation examples.