Provable causal distributed two-time-scale temporal-difference learning with instrumental variables

Published: 01 Jan 2025, Last Modified: 21 Jul 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose D-IV-TD(0) to correct the estimation bias for multi-agent reinforcement learning.•We extend D-IV-TD(0) to D-IV-SA for the generalized finite-time performance analysis.•We prove that the D-IV-TD(0) algorithm has the same theoretical performance as the D-IV-SA.•We evaluate the performance of D-IV-TD(0) through experiments.
Loading