Reliable Evaluation Protocol for Low-Precision Retrieval

Reliable Evaluation Protocol for Low-Precision Retrieval

ACL ARR 2025 July Submission274 Authors

26 Jul 2025 (modified: 31 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Lowering the numerical precision of model parameters and computations is widely adopted to improve the efficiency of retrieval systems. However, when computing relevance scores between the query and documents in low-precision, we observe \textit{spurious ties} due to the reduced granularity. This introduces high variability in the results based on tie resolution, making the evaluation less reliable. To address this, we propose a more robust retrieval evaluation protocol designed to reduce score variation. It consists of: (1) High-Precision Scoring (HPS), which upcasts the final scoring step to higher precision to resolve tied candidates with minimal computational cost; and (2) Tie-aware Retrieval Metrics (TRM), which report expected scores, range, and bias to quantify ranking uncertainty. Our experiments test multiple models with three scoring functions on two retrieval datasets to demonstrate that HPS dramatically reduces tie-induced instability, and TRM accurately recovers expected metric values. This combination enables a more robust and consistent evaluation system for lower-precision retrievals.

Paper Type: Short

Research Area: Information Retrieval and Text Mining

Research Area Keywords: passage retrieval, dense retrieval, re-ranking, metrics, evaluation methodologies

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Theory

Languages Studied: English

Submission Number: 274

Loading