Achieving Certified Robustness and Maintaining Clean Accuracy via Vanilla Model Guide

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Adversarial examples, Certified Robustness
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: With the help of pre-trained vanilla model guidance improves the model's ability to represent clean inputs in certified robust training.
Abstract: Certified robustness can provide theoretical defense guarantees for deep neural network models against adversarial examples within a certain perturbation range. However, existing research on obtaining certified robustness requires specialized certified robust training from scratch for DNNs models. This approach significantly decreases the clean accuracy of normal inputs compared to vanilla models trained with vanilla training, affecting the main inference task of DNNs models and causing practical difficulties for security methods. We propose a practical training method that aims to obtain certified robustness while maintaining clean accuracy. This method involves adding a pre-trained vanilla model and applying singular value decomposition (SVD) to the weight matrices of each network layer of the vanilla model. This process yields rotation matrices and singular values that respectively affect clean accuracy and certified robustness. The vanilla model is used as a guide model, establishing a knowledge transfer process based on the similarity of rotation matrices between the guide model and the certification model that obtains certified robustness. In order to select important rotation matrix information and reduce computational cost, a low-rank approximation is used for practical knowledge transfer. Experimental results demonstrate that our approach significantly improves clean accuracy while only slightly reducing certified accuracy.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8901
Loading