Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training

Bhavna Gopal; Huanrui Yang; Jingyang Zhang; Mark Horton; Yiran Chen

Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training

Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, Yiran Chen

19 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial robustness, criticality, Computer vision, Adversarial attacks, defense

TL;DR: CLAT mitigates adversarial overfitting by selectively fine-tuning robustness-critical layers and achieves SOTA performance across all baselines, with over 2% improvements in both clean accuracy and adversarial robustness.

Abstract: Adversarial training (AT) is a common technique for enhancing neural network robustness. Typically, AT updates all trainable parameters, but such comprehensive adjustments can lead to overfitting and increased generalization errors on clean data. Research suggests that fine-tuning specific parameters may be more effective; however, methods for identifying these essential parameters and establishing effective optimization objectives remain unclear and inadequately addressed. We present CLAT, an innovative adversarial fine-tuning algorithm that mitigates adversarial overfitting by integrating "criticality" into the training process. Instead of tuning the entire model, CLAT identifies and fine-tunes fewer parameters in robustness-critical layers—those predominantly learning non-robust features—while keeping the rest of the model fixed. Additionally, CLAT employs a dynamic layer selection process that adapts to changes in layer criticality during training. Empirical results demonstrate that CLAT can be seamlessly integrated with existing adversarial training methods, enhancing clean accuracy and adversarial robustness by over 2% compared to baseline approaches.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1956

Loading