Keywords: Adversarial Robustness, Adversarial Training, Adversarial Distillation
Abstract: Adversarial training significantly improves adversarial robustness, but superior performance is primarily attained with large models.
This substantial performance gap for smaller models has spurred active research into adversarial distillation (AD) to mitigate the difference.
Existing AD methods leverage the teacher’s logits as a guide.
In contrast to these approaches, we aim to transfer another piece of knowledge from the teacher, the input gradient.
In this paper, we propose a distillation module termed Indirect Gradient Distillation Module (IGDM) that indirectly matches the student’s input gradient with that of the teacher.
Experimental results show that IGDM seamlessly integrates with existing AD methods, significantly enhancing their performance.
Particularly, utilizing IGDM on the CIFAR-100 dataset improves the AutoAttack accuracy from 28.06\% to 30.32\% with the ResNet-18 architecture and from 26.18\% to 29.32\% with the MobileNetV2 architecture when integrated into the SOTA method without additional data augmentation.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5330
Loading