Vicinal Assessment of Model Generalization

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Label-free Evaluation, Vicinal Risk, Model-Centric AI
Abstract: This paper studies how to assess the generalization ability of classification models on out-of-distribution test sets without relying on test ground truths. Existing works usually compute an unsupervised indicator of a certain model property, such as confidence and invariance, which is correlated with out-of-distribution accuracy. However, these indicators are generally computed based on a \textit{single} test sample in isolation (and subsequently averaged over the test set), and thus are subject to spurious model responses, such as excessively high or low confidence. To address this issue, we propose to integrate model responses of \textit{neighboring} test samples into the correctness indicator of the every test sample. Intuitively, if a model consistently demonstrates high correctness scores for nearby samples, it becomes more likely that the target sample will also be correctly predicted, and vice versa. This score is finally averaged across all test samples to indicate model accuracy holistically. This strategy is developed under the vicinal risk formulation, and, since its computation does not rely on labels, is called vicinal risk proxy (VRP). We show that VRP methodologically can be applied to existing generalization indicators such as average confidence and effective invariance and experimentally brings consistent improvements to these baselines. That is, stronger correlation with model accuracy is observed, especially on severe out-of-distribution test sets.
Supplementary Material: pdf
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2550
Loading