Evaluating Retrieval Models through Histogram AnalysisOpen Website

2015 (modified: 12 Nov 2022)SIGIR 2015Readers: Everyone
Abstract: We present a novel approach for efficiently evaluating the performance of retrieval models and introduce two evaluation metrics: Distributional Overlap (DO), which compares the clustering of scores of relevant and non-relevant documents, and Histogram Slope Analysis (HSA), which examines the log of the empirical distributions of relevant and non-relevant documents. Unlike rank evaluation metrics such as mean average precision (MAP) and normalized discounted cumulative gain (NDCG), DO and HSA only require calculating model scores of queries and a fixed sample of relevant and non-relevant documents rather than scoring the entire collection, even implicitly by means of an inverted index. In experimental meta-evaluations, we find that HSA achieves high correlation with MAP and NDCG on a monolingual and a cross-language document similarity task; on four ad-hoc web retrieval tasks; and on an analysis of ten TREC tasks from the past ten years. In addition, when evaluating latent Dirichlet allocation (LDA) models on document similarity tasks, HSA achieves better correlation with MAP and NCDG than perplexity, an intrinsic metric widely used with topic models.
0 Replies

Loading