Axiomatic Characterization of the Hamming and Jaccard Distances

05 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Keywords: Hamming Distance, Jaccard Distance, Axioms, Elections, Participatory Budgeting, Voting, Preferences
TL;DR: We characterize the Hamming and Jaccard distances using a comprehensive axiom system, which highlights their similarities and differences and identifies their shortcomings.
Abstract: Measures of dissimilarity between a pair of objects can play a pivotal role in many machine learning objectives such as clustering, outlier detection, or data visualization. In this paper, we focus on data in the form of binary vectors and analyze several methods of measuring dissimilarity between them. We introduce several properties, *axioms*, that a measure of dissimilarity can satisfy and characterize the *Hamming* and *Jaccard* distances as the only measures satisfying particular subsets of our axioms. Based on our analysis, we identify shortcomings of both distances and propose novel approaches that are better suited for certain applications. We complement our theoretical findings by an extensive empirical study. Our primary motivation is the analysis of election data, in which the votes have the form of binary approval of alternatives, but the applicability of our results reaches far beyond that.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 7264
Loading