Concept-Based Off-Policy Evaluation

Ritam Majumdar; Jack Teversham; Sonali Parbhoo

Concept-Based Off-Policy Evaluation

Ritam Majumdar, Jack Teversham, Sonali Parbhoo

Published: 09 May 2025, Last Modified: 21 Aug 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Off Policy Evalutaion, Interpretability, Concept Bottleneck Models, Reliable OPE

Abstract: Evaluating off-policy decisions using batch data is challenging because of limited sample sizes which lead to high variance. Identifying and addressing the sources of this variance is crucial to improve off-policy evaluation in practice. Recent research on Concept Bottleneck Models (CBMs) shows that using human-explainable concepts can improve predictions and provide additional context for understanding decisions. In this paper, we propose incorporating an analogous notion of concepts into OPE to provide additional context that may help us identify specific areas where variance is high. We introduce a family of new concept-based OPE estimators and show that these estimators have two key properties when the concepts are known in advance: they remain unbiased whilst reducing variance of overall estimates. Since real-world applications often lack predefined concepts, we further develop an end-to-end algorithm to learn interpretable, concise, and diverse concepts optimized for variance reduction in OPE. Our experiments on synthetic and real-world datasets show that both known and learnt concept-based estimators significantly improve OPE performance. Crucially, our concept-based estimators offer two advantages over existing OPE methods. First, they are easily interpretable. Second, they allow us to isolate specific concepts contributing to variance. Upon performing targeted interventions on these concepts, we can further enhance the quality of OPE estimators.

Submission Number: 208

Loading