Strategy-driven Central Limit Theorem for Sequential Test

Strategy-driven Central Limit Theorem for Sequential Test

ICLR 2026 Conference Submission15086 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Heterogeneous Treatment Effects; Sequential Testing; Strategy Limit Theory; A/B Testing

Abstract: A/B testing is a critical tool for evaluating the effectiveness of strategies, but its conclusions are typically limited to the Average Treatment Effect (ATE). However, a more fundamental question arises when deciding whether to implement personalized interventions: whether Heterogeneous Treatment Effects (HTE) exist. This paper addresses the challenge of testing for the existence of HTE. While current methods based on the t-test are effective, the core pursuit of statistical inference is to enhance test power to more sensitively detect subtle heterogeneous effects. To this end, this paper proposes a novel sequential testing framework based on Strategy Limit Theory, specifically designed to more effectively identify these hard-to-detect, subtle differences. The main contributions are as follows: (i) We integrate HTE existence testing into a strategic decision-making process and construct a new test statistic based on Strategy Limit Theory, weighted by parameter $\lambda$ to control Type I error. By maximizing the divergence between the distributions under the null and alternative hypotheses, we enhance the test’s power. (ii) We extend this approach to online experimental settings and introduce a Bi-Optimal Strategy (BOS). This strategy not only improves statistical power but also significantly enhances the cumulative reward of the experiment. (iii) We develop a complete sequential testing procedure. By combining the alpha-spending function with the Bootstrap method, we determine dynamic stopping boundaries to accommodate the complex joint distribution of our statistic. (iv) We validate the effectiveness and superiority of our proposed method through extensive simulation experiments and empirical analysis on Tenrec, a real-world dataset from Tencent's recommendation system.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 15086

Loading