Ranking evaluation metrics from a group-theoretic perspective

TMLR Paper1366 Authors

11 Jul 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Searching for always better-performing machine learning techniques requires continuously comparing with well-established methods. While facing the challenges of finding the right evaluation metric to prove the strengths of the proposed models, choosing one metric despite another might hide the method's weaknesses intentionally or not. Conversely, one metric fitting all applications is probably not existing and represents a hopeless search. In several applications, comparing rankings represents a severe challenge: various metrics strictly correlated to the context appeared to evaluate their similarities and differences. However, most metrics spread to other areas, although a complete understanding of their internal functioning is often missing, leading to unexpected results and misuses. Furthermore, as distinguished metrics focus on different aspects and rankings' characteristics, the comparisons of the models' results outputs given by the various metrics are often contradicting. We propose to theorize rankings using the mathematical formality of symmetric groups to rise above the possible contextualization of the evaluation metrics. We prove that contradictory evaluations frequently appear among pairs of metrics, introduce the agreement ratio to measure the frequency of such disagreement, and formally define essential mathematical properties for ranking evaluation metrics. We finally check if any of these metrics is a distance in the mathematical sense. In conclusion, our analysis underlines the inconsistencies' reasons, compares the metrics purely based on mathematical concepts, and allows for a more conscious choice based on specific exigencies.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The changes in the revised version are all highlighted in red color.
Assigned Action Editor: ~Jaakko_Peltonen1
Submission Number: 1366
Loading