Context-Aware Query Term Difficulty Estimation for Performance Prediction

Abbas Saleminezhad, Negar Arabzadeh, Soosan Beheshti, Ebrahim Bagheri

Published: 2024, Last Modified: 06 Jan 2026ECIR (4) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Research has already found that many retrieval methods are sensitive to the choice and order of terms that appear in a query, which can significantly impact retrieval effectiveness. We capitalize on this finding in order to predict the performance of a query. More specifically, we propose to learn query term difficulty weights specifically within the context of each query, which could then be used as indicators of whether each query term has the likelihood of making the query more effective or not. We show how such difficulty weights can be learnt through the finetuning of a language model. In addition, we propose an approach to integrate the learnt weights into a cross-encoder architecture to predict query performance. We show that our proposed approach shows a consistently strong performance prediction on the MSMARCO collection and its associated widely used Trec Deep Learning tracks query sets. Our findings demonstrate that our method is able to show consistently strong performance prediction over different query sets (MSMARCO Dev, TREC DL’19, ’20, Hard) and a range of evaluation metrics (Kendall, Spearman, sMARE).