Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation

Minqin Zhu; Zexu Sun; Ruoxuan Xiong; Anpeng Wu; Baohong Li; Caizhi Tang; JUN ZHOU; Fei Wu; Kun Kuang

Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation

Minqin Zhu, Zexu Sun, Ruoxuan Xiong, Anpeng Wu, Baohong Li, Caizhi Tang, JUN ZHOU, Fei Wu, Kun Kuang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Uplift modeling is crucial for identifying individuals likely to respond to a treatment in applications like marketing and customer retention, but evaluating these models is challenging due to the inaccessibility of counterfactual outcomes in real-world settings. In this paper, we identify a fundamental limitation in existing evaluation metrics, such as the uplift and Qini curves, which fail to rank individuals with binary negative outcomes accurately. This can lead to biased evaluations, where biased models receive higher curve values than unbiased ones, resulting in suboptimal model selection. To address this, we propose the Principled Uplift Curve (PUC), a novel evaluation metric that assigns equal curve values of individuals with both positive and negative binary outcomes, offering a more balanced and unbiased assessment. We then derive the Principled Uplift Loss (PUL) function from the PUC and integrate it into a new uplift model, the Principled Treatment and Outcome Network (PTONet), to reduce bias during uplift model training. Experiments on both simulated and real-world datasets demonstrate that the PUC provides less biased evaluations, while PTONet outperforms existing methods. The source code is available at: https://github.com/euzmin/PUC.

Lay Summary: In this paper, we fundamentally identify the limitations of conventional evaluation metrics in individual ranking based on causal effects, particularly their inability to accurately rank individuals with binary negative outcomes. This limitation can result in biased evaluations, where models with systematic biases receive higher curve values than unbiased ones, ultimately leading to suboptimal model selection. To address this issue, we propose a novel and effective metric that treats individuals with both positive and negative binary outcomes equally in curve construction. Furthermore, we introduce a strategy for leveraging this metric to guide the optimization of uplift models.

Link To Code: https://github.com/euzmin/PUC

Primary Area: General Machine Learning->Causality

Keywords: Uplift Modeling; Causal Ranking; Qini Curve; Marketing

Submission Number: 4427

Loading