Keywords: robustness, OOD performance estimation, foundation model safety
Abstract: Estimating out-of-distribution performance is critical to safely deploy machine learning models. Recently, Baek et al. showed that the phenomenon ``agreement-on-the-line'' can be a reliable method for predicting the OOD accuracy of models in an ensemble consisting largely of CNNs trained from scratch. However, it is now increasingly common to lightly fine-tune foundation models, and it is unclear whether such fine-tuning is sufficient to produce enough diversity in model predictions for such agreement-based methods to work properly. In this paper, we develop methods for reliably applying agreement-on-the-line-based performance estimation to fine-tuned foundation models. In particular, we first study the case of fine-tuning a single foundation model, where we extensively study how different types of randomness (linear head initialization, data shuffling, and data subsetting) contribute to agreement-on-the-line of the resulting model sets. Somewhat surprisingly, we find that it is possible to obtain strong agreement via random initialization of the linear head alone. Next, we find how _multiple_ foundation models, pretrained on different data sets but fine-tuned on the same task, also observe agreement-on-the-line. Again rather surprisingly, the diversity of such models is not too disparate, and they all lie on the same agreement line. In total, these methods enable reliable and efficient estimation of OOD accuracy for fine-tuned foundation models, without leveraging any labeled OOD data.
Submission Number: 101
Loading