Navigating the Rashomon Set: The Impact of Score Distributions and Decision Thresholds on Model Agreement

Published: 02 Mar 2026, Last Modified: 14 Apr 2026AFAA 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Main Papers Track (6 to 9 pages)
Keywords: Predictive multiplicity, classification, decision thresholds
TL;DR: We relate predictive multiplicity in binary classification with the distribution of residuals
Abstract: The existence of multiple equally accurate models for the same dataset, known as Rashomon effect, has recently attracted the attention of the machine learning community. The multiplicity of models permits practitioners to select for accurate models, but also satisfy other objectives such as fairness or interpretability. However, the disagreement of models on individual samples, measured by ambiguity, can reduce the credibility of decisions. In this paper, we theoretically study the ambiguity of models obtained from a randomized training procedure by relating it to the distribution of residuals, and more particularly, the occurrence of large residuals. Based on our theoretical results, we present a simple yet effective approach for threshold selection that reduces ambiguity at a low cost to accuracy. We also present an adapted loss for binary classification that reduces ambiguity by controlling the tail of the distribution of residuals. In experiments using five datasets, our methodology demonstrated a reduction in ambiguity at a low cost in terms of both accuracy and computational resources.
Submission Number: 30
Loading