To Each Metric Its Decoding: Post-Hoc Optimal Decision Rules of Probabilistic Hierarchical Classifiers

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose an optimal decoding framework for hierarchical classification, deriving decision rules that optimize hierarchical metrics.
Abstract: Hierarchical classification offers an approach to incorporate the concept of mistake severity by leveraging a structured, labeled hierarchy. However, decoding in such settings frequently relies on heuristic decision rules, which may not align with task-specific evaluation metrics. In this work, we propose a framework for the optimal decoding of an output probability distribution with respect to a target metric. We derive optimal decision rules for increasingly complex prediction settings, providing universal algorithms when candidates are limited to the set of nodes. In the most general case of predicting a *subset of nodes*, we focus on rules dedicated to the hierarchical $\mathrm{hF}_{\beta}$ scores, tailored to hierarchical settings. To demonstrate the practical utility of our approach, we conduct extensive empirical evaluations, showcasing the superiority of our proposed optimal strategies, particularly in underdetermined scenarios. These results highlight the potential of our methods to enhance the performance and reliability of hierarchical classifiers in real-world applications.
Lay Summary: Many real-world tasks require sorting items into categories that form a hierarchy—think of sorting photos first by “animals” or “vehicles,” then by subcategories like “dogs” or “cars.” In such setups, some mistakes matter more than others (e.g., calling a wolf a dog is certainly less serious than calling it a car), but existing methods for selecting categories from an AI system probability estimate usually rely on simple « if-then » rules that don’t line up with how we actually judge performance. We introduce a clear, metric-driven framework that finds the absolute best way to choose categories from an AI’s probability estimates. We work out exact optimal predictions for different scenarios—from picking a single node in the hierarchy to selecting several at once—so that the system optimizes directly for the scores we care about, including specialized scores that balance different kinds of errors. By using these optimal strategies, hierarchical classifiers become noticeably more performant, especially in ambiguous cases. This advance can improve real-world applications—such as medical diagnosis, document organization, or wildlife monitoring—by reducing costly hierarchical misclassifications.
Link To Code: https://github.com/RomanPlaud/hierarchical_decision_rules
Primary Area: General Machine Learning->Evaluation
Keywords: Hierarchical Classification, Optimal Decoding
Submission Number: 12830
Loading