Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

Naihao Deng; Yilun Zhu; Joan Nwatu; Clayton Scott; Rada Mihalcea

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

Naihao Deng, Yilun Zhu, Joan Nwatu, Clayton Scott, Rada Mihalcea

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: fairness, stereotype, deductive stereotype

TL;DR: We characterize LLMs' deductive stereotyping and propose fair-GCG to mitigate it.

Abstract: Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a dominant failure mode, deductive stereotyping, in which models apply population-level statistical regularities to individual cases, producing logically coherent yet socially biased inferences. We provide a statistical interpretation of this phenomenon. To steer models toward fairness-aware reasoning, we propose a reasoning-time injection framework. We further introduce Fair-GCG to systematically discover effective injection phrases. Injection phrases discovered by Fair-GCG improve performance across multiple fairness benchmarks, generalize from smaller to larger LLMs, and transfer to real-world fairness-sensitive tasks.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 1

Loading