AutoBiasTest: Controllable Test Sentence Generation for Open-Ended Social Bias Testing in Language Models at Scale
Keywords: social bias, language models, social bias testing, generative AI
TL;DR: We are proposing a novel framework for controllable test sentence generation for stereotypical bias testing in large language models which helps discover unseen biases.
Abstract: Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates controlled sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation controlled by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic contexts. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions.
Submission Number: 32