Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments

Qianglin Wen; Chengchun Shi; YingYang; Niansheng Tang; Hongtu Zhu

Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments

Qianglin Wen, Chengchun Shi, YingYang, Niansheng Tang, Hongtu Zhu

Published: 01 May 2025, Last Modified: 25 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: A/B testing has become the gold standard for modern technological industries for policy evaluation. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of various switchback designs in Markovian environments. Unlike many existing works which derive the optimal design based on specific and relatively simple estimators, our analysis covers a range of state-of-the-art estimators developed in the reinforcement learning literature. It reveals that the effectiveness of different switchback designs depends crucially on (i) the size of the carryover effect and (ii) the autocorrelations among reward errors over time. Meanwhile, these findings are estimator-agnostic, i.e., they apply to all the aforementioned estimators. Based on these insights, we provide a workflow to offer guidelines for practitioners on designing switchback experiments in A/B testing.

Lay Summary: A/B testing has become the gold standard for policy evaluation in modern technological industries. This paper is motivated by the widespread use of switchback experiments—where a baseline and a new policy alternate at fixed intervals—and presents a comprehensive comparative analysis of various switchback designs in Markovian environments. Unlike many existing studies that derive optimal designs based on specific and relatively simple estimators, our analysis incorporates a range of advanced estimators developed in the reinforcement learning (RL) literature. We show that the effectiveness of different switchback designs is highly dependent on two key factors: (i) the size of the carryover effect—the influence of previous treatments on future outcomes, and (ii) the autocorrelations among reward errors over time. Notably, these findings are estimator-agnostic, meaning they apply to most RL estimators. Building on these insights, we propose a workflow that provides practical guidelines for practitioners on designing switchback experiments in A/B testing.

Link To Code: https://github.com/QianglinSIMON/SwitchMDP

Primary Area: General Machine Learning->Causality

Keywords: Policy Evaluation; A/B Testing; Reinforcement Learning; Switchback ; Experimental Design

Submission Number: 7041

Loading