Keywords: foundational model, subpopulation shift, bias identification, interpretable AI
TL;DR: Identifying biases by analysing the weights of fine-tuned foundational models.
Abstract: Datasets and pre-trained models come with intrinsic biases. Most methods rely on spotting them by analyzing misclassified samples, in a semi-automated human-computer validation. In contrast, we propose ConceptDrift, a method that analyzes the weights of a linear probe, learned on top of a foundation model. We capitalize on the weight update trajectory, which starts from the embedding of the textual representation of the class, and proceeds to drift towards embeddings that disclose hidden biases.
Different from prior work, with this approach we can pin-point unwanted correlations from a dataset, providing more than just possible explanations for the wrong predictions. We empirically prove the efficacy of our method, by significantly improving zero-shot performance with biased-augmented prompting. Our method is not bound to a single modality, and we experiment in this work with both image (Waterbirds, CelebA, Nico++) and text datasets (CivilComments).
Track: Main track
Submitted Paper: No
Published Paper: No
Submission Number: 74
Loading