Abstract: Large language models (LLMs) are rapidly being adopted in the field of workplace settings. However, through their extensive training on massive and unregulated internet datasets, LLMs potentially reflect or exaggerate social biases and stereotypes. This study presents a framework for auditing bias in LLM-based career recommendations, considering multiple social groups, a range of education and specification backgrounds, as well as hundreds of real-world occupations. With an LLM-generated career recommendation dataset and a real large-scale employment dataset, we conducted a comprehensive evaluation of GPT-4.1 and found significant issues of stereotype bias and misalignment. In particular, the LLM recommendations for majority groups are more closely aligned with both the neutral groups and their corresponding actual occupation distributions, indicating that the direct deployment of such systems in employment processes may exacerbate occupational stereotypes and further entrench invisible social barriers.
Paper Type: Short
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation, ethical considerations in NLP applications, reflections and critiques
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 6128
Loading