Small Changes, Large Consequences: Analyzing the Allocational Fairness of LLMs in Hiring Contexts

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: bias, fairness, harms, evaluation, real-world applications
Abstract: Large language models (LLMs) are increasingly being deployed in high-stakes settings like hiring, yet their potential for unfair decision-making remains understudied in generation and retrieval. In this work, we examine the allocational fairness of LLM-based hiring systems through two tasks that reflect actual HR usage (resume summarization and applicant ranking), using a synthetic resume dataset with demographic perturbations and curated job postings. Our findings reveal that generated summaries exhibit meaningful differences more frequently for race than for gender perturbations. Additionally, retrieval models exhibit high ranking sensitivity to both gender and race perturbations, and can show comparable sensitivity to both demographic and non-demographic changes. Overall, our results indicate that LLM-based hiring systems, especially in the retrieval stage, can exhibit notable biases that lead to discriminatory outcomes.
Submission Number: 182
Loading