Off-Policy Evaluation for Action-Dependent Non-stationary Environments

Yash Chandak; Shiv Shankar; Nathaniel D. Bastian; Bruno Castro da Silva; Emma Brunskill; Philip S. Thomas

Off-Policy Evaluation for Action-Dependent Non-stationary Environments

Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskill, Philip S. Thomas

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: non-stationarity, off-policy, reinforcement learning, counterfactual

TL;DR: How to do off-policy evaluation from data collected amidst non-stationarity that depends on both exogenous factors and past actions/interactions?

Abstract: Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (\textit{passive} non-stationarity), changes induced by interactions with the system itself (\textit{active} non-stationarity), or both (\textit{hybrid} non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a \textit{higher-order stationarity} assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/off-policy-evaluation-for-action-dependent/code)

16 Replies

Loading