What’s Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias

Aida Mohammadshahi; Yani Ioannou

What’s Left After Distillation? How Knowledge Transfer Impacts Fairness and Bias

Aida Mohammadshahi, Yani Ioannou

Published: 24 Mar 2025, Last Modified: 28 Mar 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Knowledge Distillation is a commonly used Deep Neural Network (DNN) compression method, which often maintains overall generalization performance. However, we show that even for balanced image classification datasets, such as CIFAR-100, Tiny ImageNet and ImageNet, as many as 41% of the classes are statistically significantly affected by distillation when comparing class-wise accuracy (i.e. class bias) between a teacher/distilled student or distilled student/non-distilled student model. Changes in class bias are not necessarily an undesirable outcome when considered outside of the context of a model’s usage. Using two common fairness metrics, Demographic Parity Difference (DPD) and Equalized Odds Difference (EOD) on models trained with the CelebA, Trifeature, and HateXplain datasets, our results suggest that increasing the distillation temperature improves the distilled student model’s fairness, and the distilled student fairness can even surpass the fairness of the teacher model at high temperatures. Additionally, we examine individual fairness, ensuring similar instances receive similar predictions. Our results confirm that higher temperatures also improve the distilled student model’s individual fairness. This study highlights the uneven effects of distillation on certain classes and its potentially significant role in fairness, emphasizing that caution is warranted when using distilled models for sensitive application domains.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: EiC revision: updated pdf to correct formatting error

Assigned Action Editor: ~Yaoliang_Yu1

Submission Number: 3580

Loading