Keywords: Large Language Model, Social Bias, Video Games, LLM Application
Abstract: Large Language Models (LLMs) have increasingly enhanced or replaced traditional Non-Player Characters (NPCs) in video games.
However, these LLM-based NPCs inherit underlying social biases (e.g., race or class), posing fairness risks during in-game interactions.
To address the limited exploration of this issue, we introduce FairGamer, the first benchmark to evaluate social biases across three interaction patterns: transaction, cooperation, and competition. FairGamer assesses four bias types, including class, race, age, and nationality, across 12 distinct evaluation tasks using a novel metric, FairMCV. Our evaluation of seven frontier LLMs reveals that:
(1) models exhibit biased decision-making, with Grok-4-Fast demonstrating the highest bias (average FairMCV = 76.9\%); and
(2) larger LLMs display more severe social biases, suggesting that increased model capacity inadvertently amplifies these biases.
We release FairGamer at https://github.com/Anonymous999-xxx/FairGamer to facilitate future research on NPC fairness.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: benchmarking, evaluation methodologies, evaluation, metrics, model bias/fairness evaluation, language/cultural bias analysis, bias/toxicity, applications
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English, Chinese
Submission Number: 1126
Loading