Counterfactual Learning to Rank using Heterogeneous Treatment Effect Estimation

Mucun Tian, Chun Guo, Vito Claudio Ostuni, Zhen Zhu

Published: 30 Jul 2020, Last Modified: 30 Dec 2025SIGIR eCom’20EveryoneCC BY-SA 4.0

Abstract: Learning-to-Rank (LTR) models trained from implicit feedback (e.g. clicks) suffer from inherent biases. A well-known one is the position bias — documents in top positions are more likely to receive clicks due in part to their position advantages. To unbiasedly learn to rank, existing counterfactual frameworks first estimate the propensity (probability) of missing clicks with intervention data from a small portion of search traffic, and then use inverse propensity score (IPS) to debias LTR algorithms on the whole data set. These approaches often assume the propensity only depends on the position of the document, which may cause high estimation variance in applications where the search context (e.g. query, user) varies frequently. While context-dependent propensity models reduce variance, accurate estimations may require randomization or intervention on a large amount of traffic, which may not be realistic in real-world systems, especially for long tail queries. In this work, we employ heterogeneous treatment effect estimation techniques to estimate position bias when intervention click data is limited. We then use such estimations to debias the observed click distribution and re-draw a new de-biased data set, which can be used for any LTR algorithms. We conduct simulations with varying experiment conditions and show the effectiveness of the proposed method in regimes with long tail queries and sparse clicks.