Abstract: Much research has sought to evaluate the degree to which large language models reflect
social biases. We complement such work with
an approach to elucidating the connections between language model predictions and people’s
social attitudes. We show how word preferences in a large language model reflect social
attitudes about gender, using two datasets from
human experiments that found differences in
gendered or gender neutral word choices by participants with differing views on gender (progressive, moderate, or conservative). We find
that the language model BERT takes into account factors that shape human lexical choice
of such language, but may not weigh those factors in the same way people do. Moreover, we
show that BERT’s predictions most resemble
responses from participants with moderate to
conservative views on gender. Such findings illuminate how a language model: (1) may differ
from people in how it deploys words that signal gender, and (2) may prioritize some social
attitudes over others.
0 Replies
Loading