Racial and Gender Stereotypes Encoded Into CLIP Representations

Vatsal Baherwani; Joseph James Vincent

Racial and Gender Stereotypes Encoded Into CLIP Representations

Vatsal Baherwani, Joseph James Vincent

Published: 19 Mar 2024, Last Modified: 12 May 2024Tiny Papers @ ICLR 2024 ArchiveEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision Language Models, Bias, Fairness

TL;DR: We analyze the racial and gender disparities inherent to CLIP's understanding of human images.

Abstract: OpenAI’s CLIP is a vision language model widely used in current state-of-the-art architectures. This paper analyzes racial and gender biases present in CLIP’s representation of human images. We evaluate images from the FairFace dataset grouped by race and gender on a series of traits describing demeanor, intelligence, and character. We find CLIP's understanding of these traits to be heavily influenced by race and gender, suggesting that this social bias propagates into many other architectures.

Supplementary Material: zip

Submission Number: 184

Loading