Abstract: While many multidimensional models of relevance have been posited, prior studies have been largely exploratory rather than confirmatory. Lacking a methodological framework to quantify the relationships among factors or measure model fit to observed data, many past models could not be empirically tested or falsified. To enable more positivist experimentation, Xu and Chen [77] proposed a psychometric framework for multidimensional relevance modeling. However, we show their framework exhibits several methodological limitations which could call into question the validity of findings drawn from it. In this work, we identify and address these limitations, scale their methodology via crowdsourcing, and describe quality control methods from psychometrics which stand to benefit crowdsourcing IR studies in general. Methodology we describe for relevance judging is expected to benefit both human-centered and systems-centered IR.
0 Replies
Loading