On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curveDownload PDF

Published: 31 Oct 2022, Last Modified: 30 Dec 2022NeurIPS 2022 AcceptReaders: Everyone
Keywords: Classification, optimal Receiver Operating Characteristic (ROC) curve, Consistency, Model misspecification
TL;DR: Under both correct and incorrect model specification, we compare three commonly used methods for estimating the optimal ROC curve in terms of the consistency.
Abstract: Under a standard binary classification setting with possible model misspecification, we study the problem of estimating general Receiver Operating Characteristic (ROC) curve, which is an arbitrary set of false positive rate (FPR) and true positive rate (TPR) pairs. We formally introduce the notion of \textit{optimal ROC curve} over a general model space. It is argued that any ROC curve estimation methods implemented over the given model space should target the optimal ROC curve over that space. Three popular ROC curve estimation methods are then analyzed at the population level (i.e., when there are infinite number of samples) under both correct and incorrect model specification. Based on our analysis, they are all consistent when the surrogate loss function satisfies certain conditions and the given model space includes all measurable classifiers. Interestingly, some of these conditions are similar to those that are required to ensure classification consistency. When the model space is incorrectly specified, however, we show that only one method leads to consistent estimation of the ROC curve over the chosen model space. We present some numerical results to demonstrate the effects of model misspecification on the performance of various methods in terms of their ROC curve estimates.
Supplementary Material: pdf
14 Replies

Loading