Unmasking the Hidden Fairness, Bias, and Safety Costs of Compression with Mixture-of-Expert Models

Published: 03 Jun 2026, Last Modified: 03 Jun 2026AI4GOOD Workshop 2026 RegularEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, mixture of experts, model compression, algorithmic fairness, social bias, safety alignment, expert pruning, expert merging, quantization, weight sparsity, refusal behavior
TL;DR: Compressing MoE language models preserves many standard capabilities, but can produce uneven shifts in social bias and refusal behavior across compression methods and safety-data settings.
Abstract: Mixture-of-Experts (MoE) language models are often compressed for deployment, yet the fairness, bias, and safety effects of these interventions have remained less examined within Mixture-of-Experts architectures, particularly when Expert Compression (EC) methods are used or safety data is also varied. In this work, we investigate how compression changes fairness and safety behavior in MoE language models under different safety-data proportions. Across Qwen3-30B-A3B-Instruct-2507 and ERNIE-4.5-21B-A3B-PT, we find that EC is highly sensitive to safety data, with HC-SMoE expert merging producing the strongest instability overall. Quantization is generally more stable than EC, but it is not neutral, and weight sparsity can also introduce substantial shifts across the benchmark suite. We further show that benchmark level improvements can be misleading: some heavier pruning settings appeared less erratic on selected fairness benchmarks, while prompt matched generation diagnostics still showed greater behavioral drift from the dense baseline. These results suggest that fairness under MoE compression does not follow a simple monotonic pattern, and that safety data choices should be treated as a central consideration when compressing models intended for fairer deployment.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 282
Loading