Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Ting-Han Fan; Peter Ramadge

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Ting-Han Fan, Peter Ramadge

Published: 28 Jan 2022, Last Modified: 26 May 2025ICLR 2022 SubmittedReaders: Everyone

Abstract: Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/explaining-off-policy-actor-critic-from-a/code)

10 Replies

Loading