Abstract: Learning-to-Rank (LTR) models trained from implicit feedback (e.g. clicks) suffer from inherent biases. A well-known one is the position
bias — documents in top positions are more likely to receive clicks
due in part to their position advantages. To unbiasedly learn to
rank, existing counterfactual frameworks first estimate the propensity (probability) of missing clicks with intervention data from a
small portion of search traffic, and then use inverse propensity
score (IPS) to debias LTR algorithms on the whole data set. These
approaches often assume the propensity only depends on the position of the document, which may cause high estimation variance
in applications where the search context (e.g. query, user) varies
frequently. While context-dependent propensity models reduce
variance, accurate estimations may require randomization or intervention on a large amount of traffic, which may not be realistic in
real-world systems, especially for long tail queries. In this work, we
employ heterogeneous treatment effect estimation techniques to
estimate position bias when intervention click data is limited. We
then use such estimations to debias the observed click distribution
and re-draw a new de-biased data set, which can be used for any
LTR algorithms. We conduct simulations with varying experiment
conditions and show the effectiveness of the proposed method in
regimes with long tail queries and sparse clicks.
Loading