Abstract: We address the problem of assessing the quality of a ranking system (e.g., search engine, recommender system, review ranker) given a fixed budget for collecting expert judgments. In particular, we propose a method that selects which items to judge in order to optimize the accuracy of the quality estimate. Our method is not only efficient, but also provides estimates that are unbiased --- unlike common approaches that tend to underestimate performance or that have a bias against new systems that are evaluated re-using previous relevance scores.
0 Replies
Loading