Abstract: Highlights•We propose D-IV-TD(0) to correct the estimation bias for multi-agent reinforcement learning.•We extend D-IV-TD(0) to D-IV-SA for the generalized finite-time performance analysis.•We prove that the D-IV-TD(0) algorithm has the same theoretical performance as the D-IV-SA.•We evaluate the performance of D-IV-TD(0) through experiments.
Loading