Selecting Training Data for Learning-Based Twitter Search

Dongxing Li, Ben He, Tiejian Luo, Xin Zhang

Published: 2015, Last Modified: 13 Nov 2024ECIR 2015EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Learning to rank is widely applied as an effective weighting scheme for Twitter search. As most learning to rank approaches are based on supervised learning, their effectiveness can be affected by the inclusion of low-quality training data. In this paper, we propose a simple and effective approach that learns a query quality classifier, which automatically selects the training data on a per-query basis. Experimental results on the TREC Tweets13 collection show that our proposed approach outperforms the conventional application of learning to rank that learns the ranking model on all training queries available.