AUC Maximization under Positive Distribution Shift

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AUC maximization, distribution shift, domain adaptation, PU learning, imbalanced data
TL;DR: In this paper, we propose a method for maximizing the AUC under the positive distribution shift by using labeled positive and unlabeled data in the training distribution and unlabeled data in the test distribution.
Abstract: Maximizing the area under the receiver operating characteristic curve (AUC) is a popular approach to imbalanced binary classification problems. Existing AUC maximization methods usually assume that training and test distributions are identical. However, this assumption is often violated in practice due to {\it a positive distribution shift}, where the negative-conditional density does not change but the positive-conditional density can vary. This shift often occurs in imbalanced classification since positive data are often more diverse and time-varying than negative data. To deal with this shift, we theoretically show that the AUC on the test distribution can be expressed by using the positive and marginal training densities and the marginal test density. Based on this result, we can maximize the AUC on the test distribution by using positive and unlabeled data in the training distribution and unlabeled data in the test distribution. The proposed method requires only positive labels in the training distribution as supervision. Moreover, the derived AUC has a simple form and thus is easy to implement. The effectiveness of the proposed method is shown with four real-world datasets.
Primary Area: Other (please use sparingly, only use the keyword field for more details)
Submission Number: 8293
Loading