Trusting the Untrustworthy: A Cautionary Tale on the Pitfalls of Training-based Rejection Option

Trusting the Untrustworthy: A Cautionary Tale on the Pitfalls of Training-based Rejection Option

TMLR Paper1068 Authors

18 Apr 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We consider the problem of selective classification, also known as rejection option. We first analyze state-of-the-art methods that involve a training phase to produce a selective classifier capable of determining when it should abstain from making a decision. Although only some of these frameworks require changes to the basic architecture of the classifier, by adding a module for selection, all methods necessitate implementing modifications to the standard training procedure and loss function for classification. Crucially, we observe two types of limitations affecting these methods: on the one side, these methods exhibit poor performance in terms of selective risk and coverage over some classes, which are not necessarily the hardest to classify; and surprisingly, on the other side, the classes for which they attain low performance vary with the model initialization. Additionally, some of these methods also decrease the accuracy of the final classification. We discuss the limitations of each framework, demonstrating that these shortcomings occur for a wide range of models and datasets. We establish a mathematical connection between the problem of detecting misclassification errors and the risk minimization for selective classification, proposing a statistical test that does not require training and can be applied to pre-trained standard classifiers to enable them with a rejection option.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Sivan_Sabato1

Submission Number: 1068

Loading