Bayesian Quantification with Black-Box Estimators

Published: 05 Jun 2024, Last Modified: 05 Jun 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Understanding how different classes are distributed in an unlabeled data set is important for the calibration of probabilistic classifiers and uncertainty quantification. Methods like adjusted classify and count, black-box shift estimators, and invariant ratio estimators use an auxiliary and potentially biased black-box classifier trained on a different data set to estimate the class distribution on the current data set and yield asymptotic guarantees under weak assumptions. We demonstrate that these algorithms are closely related to the inference in a particular probabilistic graphical model approximating the assumed ground-truth generative process, and we propose a Bayesian estimator. Then, we discuss an efficient Markov chain Monte Carlo sampling scheme for the introduced model and show an asymptotic consistency guarantee in the large-data limit. We compare the introduced model against the established point estimators in a variety of scenarios, and show it is competitive, and in some cases superior, with the non-Bayesian alternatives.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: This is the final (camera-ready) version of the manuscript. It has been deanonymised and features the following minor changes: - All experiments now report $\hat R$. - Computing resources are described in Appendix E. - Additional proofreading has been completed to improve clarity and readability.
Code: https://github.com/pawel-czyz/labelshift
Assigned Action Editor: ~Pavel_Izmailov1
Submission Number: 2421
Loading