Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Published: 28 Oct 2023, Last Modified: 02 Apr 2024DistShift 2023 PosterEveryoneRevisionsBibTeX
Keywords: calibration, robustness, distribution shift, vision-language model, fine-tuning, foundation model
TL;DR: We initiate the investigation on the calibration of VLM after fine-tuning under distribution shifts and introduces simple yet effective approaches to improve calibration error.
Abstract: While fine-tuning unleashes the potential of a pre-trained model to a specific task, it trades off the model’s generalization capability on out-of-distribution (OOD) datasets. To mitigate this, robust fine-tuning aims to ensure performance on OOD datasets as well as an in-distribution (ID) dataset for which the model is tuned. However, another criterion for reliable machine learning (ML) – confidence calibration, is overlooked despite its increasing demand for real-world high-stakes ML applications (e.g. autonomous driving). For the first, we raise concerns about the calibration of fine-tuned vision-language models (VLMs) by showing that naive fine-tuning and even state-of-the-art robust fine-tuning methods hurt the calibration of pre-trained VLMs, especially on OOD datasets. To address this, we provide a simple approach, called a calibrated robust fine-tuning (CaRot), that incentivizes the calibration and robustness on both ID and OOD datasets. Empirical results on ImageNet-1K distribution shift evaluation verify the effectiveness of our method.
Submission Number: 99