Detecting Covariate Shifts With Vision-Language Foundation Models

Alvin Heng; Harold Soh

Detecting Covariate Shifts With Vision-Language Foundation Models

Alvin Heng, Harold Soh

Published: 06 Mar 2025, Last Modified: 26 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: out-of-distribution detection, covariate shifts, CLIP, vision-language models

TL;DR: We propose to detect covariate shifts and evaluate the performance of vision-language models on covariate shift detection.

Abstract: Deployed machine learning models often encounter significant challenges in-the-wild due to distribution shifts, where inputs deviate from the training distribution. Covariate shifts, a specific type of distribution shift, have traditionally been addressed with robustness-focused approaches; however, existing models still experience substantial performance degradation under such conditions. In this work, we propose reframing covariate shift detection as an out-of-distribution (OOD) detection problem. We leverage vision-language models (VLMs), in particular CLIP, for detecting covariate shifts using zero-shot detection techniques that require no task-specific training. To facilitate this effort, we introduce ImageNet-CS, a comprehensive benchmark comprising six covariate-shifted datasets derived from ImageNet. Our results demonstrate that VLMs outperform traditional supervised methods in detecting covariate shifts, underscoring their promise for improving the reliability of models deployed in the real world.

Submission Number: 10

Loading