Constructing Test Collections using Multi-armed Bandits and Active Learning

Md Mustafizur Rahman, Mucahid Kutlu, Matthew Lease

22 Aug 2021OpenReview Archive Direct UploadReaders: Everyone

Abstract: While test collections provide the cornerstone of system-based evaluation in information retrieval, human relevance judging has become prohibitively expensive as collections have grown ever larger. Consequently, intelligently deciding which documents to judge has become increasingly important. We propose a two-phase approach to intelligent judging across topics which does not require document rankings from a shared task. In the first phase, we dynamically select the next topic to judge via a multi-armed bandit method. In the second phase, we employ active learning to select which document to judge next for that topic. Experiments on three TREC collections (varying scarcity of relevant documents) achieve τ ≈ 0.90 correlation for P@10 ranking and find 90% of the relevant documents at 48% of the original budget. To support reproducibility and follow-on work, we have shared our code online1 .

0 Replies