Coherence-based Query Performance Measures for Dense Retrieval

Maria Vlachou; Craig MacDonald

Coherence-based Query Performance Measures for Dense Retrieval

Maria Vlachou, Craig MacDonald

Published: 07 Jun 2024, Last Modified: 07 Jun 2024ICTIR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: query performance prediction, coherence-based, single-representation dense retrieval

TL;DR: We propose a number of neural embedding-based query performance measures designed for single-representation dense retrieval models, taking into account the different types of queries and the selection of evaluation measures.

Abstract: Query Performance Prediction (QPP) estimates the effectiveness of a search engine’s results in response to a query without relevance judgments. Traditionally, post-retrieval predictors have focused upon either the distribution of the retrieval scores, or the coherence of the top-ranked documents using traditional bag-of-words index representations. More recently, BERT-based models using dense embedded document representations have been used to create new predictors, but mostly applied to predict the performance of rank- ings created by BM25. Instead, we aim to predict the effectiveness of rankings created by single-representation dense retrieval mod- els (ANCE & TCT-ColBERT). Therefore, we propose a number of variants of existing unsupervised coherence-based predictors that employ neural embedding representations. In our experiments on the TREC Deep Learning Track datasets, we demonstrate improved accuracy upon dense retrieval (up to 92% compared to sparse vari- ants for TCT-ColBERT and 188% for ANCE). Going deeper, we select the most representative and best performing predictors to study the importance of differences among predictors and query types on query performance. Using existing distribution-based evaluation QPP measures and a particular type of linear mixed model, we find that query types further significantly influence query performance (and are up to 35% responsible for the unstable performance of QPP predictors), and that this sensitivity is unique to dense retrieval models. In particular, we find that in the cases where our predictors perform lower than score-based predictors, this is partially due to the sensitivity of MAP@100 to query types. Our novel analysis provides new insights into dense QPP that can explain potential unstable performance of existing predictors and outlines the unique characteristics of different query types on dense retrieval models.

Submission Number: 22

Loading