Keywords: bias, machine learning, nlp, detectors
TL;DR: We introduce BAID, a large-scale benchmark for evaluating the fairness of AI text detectors across seven sociolinguistic dimensions, revealing systematic biases in how detectors classify text from different subgroups.
Abstract: AI-generated text detectors have recently gained adoption in
educational and professional contexts. Prior research has un-
covered isolated cases of bias, particularly against English
Language Learners (ELLs) however, there is a lack of system-
atic evaluation of such systems across broader sociolinguistic
factors. In this work, we propose BAID, a comprehensive
evaluation framework for AI detectors across various types of
biases. As a part of the framework, we introduce over 200k
samples spanning 7 major categories: demographics, age, edu-
cational grade level, dialect, formality, political leaning, and
topic. We also generated synthetic versions of each sample
with carefully crafted prompts to preserve the original content
while reflecting subgroup-specific writing styles. Using this,
we evaluate four open-source state-of-the-art AI text detectors
and find consistent disparities in detection performance, partic-
ularly low recall rates for texts from underrepresented groups.
Our contributions provide a scalable, transparent approach for
auditing AI detectors and emphasize the need for bias-aware
evaluation before these tools are deployed for public use.
Submission Number: 1
Loading