Learning From Weights: Cost-Sensitive Approach For Retrieval

Nikit Begwani, Shrutendra Harsola, Rahul Agrawal

2020 (modified: 22 Dec 2022)COMAD/CODS 2020Readers: Everyone

Abstract: In sponsored search, we need to retrieve the relevant top few ads from millions of ad copies within milliseconds. Running deep learning models or interaction models online becomes computationally expensive and hence not feasible for retrieval. There is not much discussion on improving cost-effective online retrieval models which work on representation based learning. In this paper we discuss one such improvement by incorporating cost-sensitive training. Online retrieval models which are trained on click-through data treats each clicked query-document pair as equivalent. While training on click-through data is reasonable, this paper argues that it is sub-optimal because of its noisy and long-tail nature (especially for sponsored search). In this paper, we discuss the impact of incorporating or disregarding the long tail pairs in the training set. Also, we propose a weighing based strategy using which we can learn semantic representations for tail pairs without compromising the quality of retrieval. Online A/B testing on live search engine traffic showed improvements in clicks (11.8% higher CTR) and as well as improvement in quality (8.2% lower bounce rate) when compared to the unweighted model. To prove the efficacy of the model we did offline experimentation with Bi-LSTM based representation model as well.

0 Replies