Abstract: In recent years, various methods have been proposed to evaluate gender bias in large language models (LLMs).
A key challenge lies in the transferability of bias measurement methods initially developed for the English language when applied to other languages.
This work aims to contribute to this research strand by presenting five German datasets for gender bias evaluation in LLMs.
The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies.
Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German, including the ambiguous interpretation of male occupational terms and the influence of seemingly neutral nouns on gender perception.
This work contributes to the understanding of gender bias in LLMs across languages and underscores the necessity for tailored evaluation frameworks.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: model bias/fairness evaluation, prompting, corpus creation, benchmarking, evaluation methodologies, metrics
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: German
Submission Number: 4031
Loading