Constructing Test Collections using Multi-armed Bandits and Active Learning
Abstract: While test collections provide the cornerstone of system-based
evaluation in information retrieval, human relevance judging has
become prohibitively expensive as collections have grown ever
larger. Consequently, intelligently deciding which documents to
judge has become increasingly important. We propose a two-phase
approach to intelligent judging across topics which does not require document rankings from a shared task. In the first phase, we
dynamically select the next topic to judge via a multi-armed bandit
method. In the second phase, we employ active learning to select
which document to judge next for that topic. Experiments on three
TREC collections (varying scarcity of relevant documents) achieve
τ ≈ 0.90 correlation for P@10 ranking and find 90% of the relevant
documents at 48% of the original budget. To support reproducibility
and follow-on work, we have shared our code online1
.
0 Replies
Loading