Predicting the Size of Candidate Document Set for Implicit Web Search Result Diversification

Yasar Baris Ulu, Ismail Sengor Altingovde

Published: 2020, Last Modified: 19 Feb 2025ECIR (2) 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Implicit result diversification methods exploit the content of the documents in the candidate set, i.e., the initial retrieval results of a query, to obtain a relevant and diverse ranking. As our first contribution, we explore whether recently introduced word embeddings can be exploited for representing documents to improve diversification, and show a positive result. As a second improvement, we propose to automatically predict the size of candidate set on per query basis. Experimental evaluations using our BM25 runs as well as the best-performing ad hoc runs submitted to TREC (2009–2012) show that our approach improves the performance of implicit diversification up to 5.4% wrt. initial ranking.