A document rating system for preference judgements

Maryam Bashir, Jesse Anderton, Jie Wu, Peter B. Golbus, Virgil Pavlu, Javed A. Aslam

2013 (modified: 11 Nov 2022)SIGIR 2013Readers: Everyone

Abstract: High quality relevance judgments are essential for the evaluation of information retrieval systems. Traditional methods of collecting relevance judgments are based on collecting binary or graded nominal judgments, but such judgments are limited by factors such as inter-assessor disagreement and the arbitrariness of grades. Previous research has shown that it is easier for assessors to make pairwise preference judgments. However, unless the preferences collected are largely transitive, it is not clear how to combine them in order to obtain document relevance scores. Another difficulty is that the number of pairs that need to be assessed is quadratic in the number of documents. In this work, we consider the problem of inferring document relevance scores from pairwise preference judgments by analogy to tournaments using the Elo rating system. We show how to combine a linear number of pairwise preference judgments from multiple assessors to compute relevance scores for every document.

0 Replies