Large Language Models as Assessors: On the Impact of Relevance Scales

Riccardo Zamolo, Riccardo Lunardi, Michael Soprano, Gianluca Demartini, Stefano Mizzaro, Kevin Roitero

Published: 2026, Last Modified: 07 May 2026ECIR (2) 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traditionally, relevance judgments have relied on human annotators, but recent advances in Large Language Models (LLMs) have prompted growing interest in their use as a proxy for relevance judgments. In this setting, a key yet underexplored factor is the choice of relevance scale. Relevance scales range from binary to fine-grained ones, and their impact on the effectiveness of LLM-based judgments, the effects of scale conversions, and their role in the presence of potential data contamination remain unclear.
Loading