Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Hongsin Lee; Hye Won Chung

Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Hongsin Lee, Hye Won Chung

Published: 11 May 2026, Last Modified: 11 May 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Adversarial distillation in the standard min–max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students–a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability–the fraction of student-crafted adversarial examples that remain effective against the teacher–as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods.

Submission Type: Regular submission (no more than 12 pages of main content)

Code: https://github.com/HongsinLee/saad

Supplementary Material: zip

Assigned Action Editor: ~Alessandro_De_Palma1

Submission Number: 7460

Loading