Prompting for Robustness:  Extracting Robust Classifiers from Foundation Models

Amrith Setlur; Saurabh Garg; Virginia Smith; Sergey Levine

Prompting for Robustness: Extracting Robust Classifiers from Foundation Models

Amrith Setlur, Saurabh Garg, Virginia Smith, Sergey Levine

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: distribution shift, group robustness, spurious correlations, robustness, prompting, foundation models

Abstract: Machine learning models can fail when trained on distributions with hidden confounders (spuriously correlated with the label) and tested on distributions where such correlations are absent. While numerous algorithmic solutions have been explored for such distribution shifts, a surprisingly effective way to empirically improve robustness on some other types of shift (e.g., Imagenet and its distribution shifts) is to use stronger open-vocabulary classifiers derived from foundation models. In this work, we note that for more controlled shifts regulated by spurious correlations, the zero-shot and few-shot performance of foundation models is no better than ERM models, and remains unchanged when pretrained data/model is scaled. However, even in those situations, they are quite accurate at predicting possible confounders. We leverage this observation to propose Prompting for Robustness (PfR) which first uses foundation models to zero-shot predict the confounder on given labeled examples, and then learns a classifier with balanced performance across different groups. In a simplified setup, we theoretically analyze the zero-shot behavior of multimodal models explaining how contrastive pretraining can learn features that strongly couple the confounder with more robust features. Across five vision and language tasks, we show that PfR's performance nearly equals that of an oracle algorithm (group DRO) that leverages labeled spurious attributes.

Submission Number: 69

Loading