Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets

Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets

ACL ARR 2025 May Submission8089 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Accurately measuring gender stereotypical bias in language models is a complex task with many hidden aspects. Current benchmarks have underestimated this multifaceted challenge and failed to capture the full extent of the problem. This paper examines the inconsistencies between intrinsic stereotype benchmarks. We propose that currently available benchmarks may each capture different aspects of gender stereotypes rather than providing truly comprehensive measurements. Using StereoSet and CrowS-Pairs as case studies, we investigated how data distribution affects benchmark results. By applying a framework from social psychology to balance the data of these benchmarks across various components of gender stereotypes, we demonstrated that even simple balancing techniques can significantly improve the correlation between different measurement approaches. Our findings underscore the complexity of gender stereotyping in language models and point to new directions for developing more refined techniques to detect and reduce bias.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: language/cultural bias analysis

Contribution Types: Data analysis

Languages Studied: English

Submission Number: 8089

Loading