Provable causal distributed two-time-scale temporal-difference learning with instrumental variables

Yanan Ma, Jiamei Feng, Qingtao Wu, Ruijuan Zheng, Junlong Zhu, Jiangtao Xi, Mingchuan Zhang

Published: 2025, Last Modified: 21 Jul 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We propose D-IV-TD(0) to correct the estimation bias for multi-agent reinforcement learning.•We extend D-IV-TD(0) to D-IV-SA for the generalized finite-time performance analysis.•We prove that the D-IV-TD(0) algorithm has the same theoretical performance as the D-IV-SA.•We evaluate the performance of D-IV-TD(0) through experiments.