Abstract: We consider the impact of inter-assessor disagreement on the maximum performance that a ranker can hope to achieve. We demonstrate that even if a ranker were to achieve perfect performance with respect to a given assessor, when evaluated with respect to a different assessor, the measured performance of the ranker decreases significantly. This decrease in performance may largely account for observed limits on the performance of learning-to-rank algorithms.
0 Replies
Loading