Uncertainty Drives Social Bias Changes in Quantized Large Language Models

Stanley Bryan Zamora Hua; Sanae Lotfi; Irene Y. Chen

Uncertainty Drives Social Bias Changes in Quantized Large Language Models

Stanley Bryan Zamora Hua, Sanae Lotfi, Irene Y. Chen

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0

Keywords: model quantization, fairness, benchmarking

TL;DR: Post-training quantization causes up to 21% of LLM responses to flip between biased and unbiased, driven by model uncertainty rather than size, creating hidden asymmetric impacts across social groups that standard evaluation metrics completely miss.

Abstract: Aggregate bias metrics are fundamentally misleading for quantized large language models: quantization causes up to 21\% of individual responses to flip between biased and unbiased states while aggregate scores remain unchanged, creating invisible harms that standard evaluations fail to detect. To reveal this hidden phenomenon, we introduce PostTrainingBiasBench, a unified framework for rigorous bias evaluation, and conduct the first large-scale study of 50 quantized models across 13 closed- and open-ended benchmarks. We find these flips are strongly linked to model uncertainty: uncertain responses are 3-11$\times$ more likely to change than the confident ones. Through controlled intervention via preference optimization, we establish causal evidence that uncertainty drives response flipping. Quantization strength amplifies the effect (4-bit quantized models show 4-6$\times$ more behavioral changes than 8-bit). Critically, these shifts asymmetrically impact demographic groups, with bias can worsen by up to 18.6\% for some groups while improving by 14.1\% for others, yielding misleadingly neutral aggregate outcomes. Larger models show no consistent robustness advantage, and group-specific shifts vary unpredictably across model families. Together, our findings demonstrate that compression fundamentally reshapes bias patterns, underscoring the need for rigorous post-quantization evaluation and interventions to ensure reliability in practice.

PDF: pdf

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 151

Loading