A Study of Implicit Ranking Unfairness in Large Language Models

A Study of Implicit Ranking Unfairness in Large Language Models

ACL ARR 2024 June Submission348 Authors

10 Jun 2024 (modified: 12 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, Large Language Models (LLMs) have demonstrated a superior ability to serve as ranking models. However, concerns have arisen as LLMs will exhibit discriminatory ranking behaviors based on users' sensitive attributes (\eg gender). Worse still, in this paper, we identify a subtler form of discrimination in LLMs, termed \textit{implicit ranking unfairness}, where LLMs exhibit discriminatory ranking patterns based solely on non-sensitive user profiles, such as user names. Such implicit unfairness is more widespread but less noticeable, threatening the ethical foundation. To comprehensively explore such unfairness, our analysis will focus on three research aspects: (1) We propose an evaluation method to investigate the severity of implicit ranking unfairness. (2) We uncover the reasons for causing such unfairness. (3) To mitigate such unfairness effectively, we utilize a pair-wise regression method to conduct fair-aware data augmentation for LLM fine-tuning. The experiment demonstrates that our method outperforms the existing methods regarding ranking fairness. Lastly, we emphasize the need for the community to identify and mitigate the implicit unfairness, aiming to avert the potential deterioration in the reinforced human-LLMs ecosystem deterioration.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Implicit Unfairness, Ranking, Large Language Models

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 348

Loading