Racial and Gender Stereotypes Encoded Into CLIP Representations

Published: 19 Mar 2024, Last Modified: 12 May 2024Tiny Papers @ ICLR 2024 ArchiveEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Language Models, Bias, Fairness
TL;DR: We analyze the racial and gender disparities inherent to CLIP's understanding of human images.
Abstract: OpenAI’s CLIP is a vision language model widely used in current state-of-the-art architectures. This paper analyzes racial and gender biases present in CLIP’s representation of human images. We evaluate images from the FairFace dataset grouped by race and gender on a series of traits describing demeanor, intelligence, and character. We find CLIP's understanding of these traits to be heavily influenced by race and gender, suggesting that this social bias propagates into many other architectures.
Supplementary Material: zip
Submission Number: 184
Loading