Weighted AUReC: Handling Skew in Shard Map Quality Estimation for Selective Search

Published: 01 Jan 2024, Last Modified: 30 Apr 2024ECIR (4) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In selective search, a document collection is partitioned into a collection of topical index shards. To efficiently estimate the topical coherence (or quality) of a shard map, the AUReC measure was introduced. AUReC makes the assumption that shards are of similar sizes, one that is violated in practice, even for unsupervised approaches. The problem might be amplified if supervised labelling approaches with skewed class distributions are used. To estimate the quality of such unbalanced shard maps, we introduce a weighted adaptation of the AUReC measure, and empirically evaluate its effectiveness using the ClueWeb09B and Gov2 datasets. We show that it closely matches the evaluations of the original AUReC when shards are similar in size, but captures better the differences in performance when shard sizes are skewed.
Loading