2020 (modified: 30 Mar 2022)AISTATS 2020Readers: Everyone
Abstract:Estimation of importance sampling weights for off-policy evaluation of contextual bandits often results in imbalance—a mismatch between the desired and the actual distribution of state-action pairs...