Understanding the Unfairness in Network Quantization

Bing Liu; Wenjun Miao; Boyu Zhang; Qiankun Zhang; Bin Yuan; Jing Wang; Shenghao Liu; Xianjun Deng

Understanding the Unfairness in Network Quantization

Bing Liu, Wenjun Miao, Boyu Zhang, Qiankun Zhang, Bin Yuan, Jing Wang, Shenghao Liu, Xianjun Deng

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Network quantization, one of the most widely studied model compression methods, effectively quantizes a floating-point model to obtain a fixed-point one with negligible accuracy loss. Although great success was achieved in reducing the model size, it may exacerbate the unfairness in model accuracy across different groups of datasets. This paper considers two widely used algorithms: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT), with an attempt to understand how they cause this critical issue. Theoretical analysis with empirical verifications reveals two responsible factors, as well as how they influence a metric of fairness in depth. A comparison between PTQ and QAT is then made, explaining an observation that QAT behaves even worse than PTQ in fairness, although it often preserves a higher accuracy at lower bit-widths in quantization. Finally, the paper finds out that several simple data augmentation methods can be adopted to alleviate the disparate impacts of quantization, based on a further observation that class imbalance produces distinct values of the aforementioned factors among different attribute classes. We experiment on either imbalanced (UTK-Face and FER2013) or balanced (CIFAR-10 and MNIST) datasets using ResNet and VGG models for empirical evaluation.

Lay Summary: Compressing deep neural networks by converting 32-bit weights to 4-bit numbers makes them efficient to run on phones and cameras, but this shortcut can cause some demographic groups to suffer far larger errors than others, widening accuracy gaps. We examined two popular compression schemes—post-training quantization (PTQ) and quantization-aware training (QAT)—and found that the accuracy gaps hinges on two factors: each group’s gradient norm and the local curvature (Hessian trace) of the loss. Both factors spike when a group has less training data, and reducing bit-width widens the accuracy gaps; QAT, although usually more accurate overall, amplifies this disparity even more than PTQ. Fortunately, balancing the training data with simple augmentations—like rotating pictures or masking random patches—significantly reduces accuracy gaps without sacrificing overall performance. Our findings provide the first theoretical and empirical guide for building quantized yet fair models, helping practitioners deploy efficient AI on phones, cameras, and other resource-limited devices while safeguarding fairness.

Primary Area: General Machine Learning->Supervised Learning

Keywords: model quantization, supervised learning, fairness

Submission Number: 6144

Loading