Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training

Bhavna Gopal; Huanrui Yang; Jingyang Zhang; Mark Horton; Yiran Chen

Boosting Adversarial Robustness with CLAT: Criticality Leveraged Adversarial Training

Bhavna Gopal, Huanrui Yang, Jingyang Zhang, Mark Horton, Yiran Chen

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0

TL;DR: CLAT mitigates adversarial overfitting by selectively fine-tuning robustness-critical layers and achieves SOTA performance across all baselines, with over 2% improvements in both clean accuracy and adversarial robustness.

Abstract: Adversarial training (AT) enhances neural network robustness. Typically, AT updates all trainable parameters, but can lead to overfitting and increased errors on clean data. Research suggests that fine-tuning specific parameters may be more effective; however, methods for identifying these essential parameters and establishing effective optimization objectives remain inadequately addressed. We present CLAT, an innovative adversarial fine-tuning algorithm that mitigates adversarial overfitting by integrating "criticality" into the training process. Instead of tuning the entire model, CLAT identifies and fine-tunes fewer parameters in robustness-critical layers—those predominantly learning non-robust features—while keeping the rest of the model fixed. Additionally, CLAT employs a dynamic layer selection process that adapts to changes in layer criticality during training. Empirical results demonstrate that CLAT can be seamlessly integrated with existing adversarial training methods, enhancing clean accuracy and adversarial robustness by over 2% compared to baseline approaches.

Lay Summary: (1) AI models that recognize images can be fooled by small changes that are imperceptible to the human eye—known as adversarial attacks—which cause them to make incorrect predictions. Vision Transformers (models for vision classification tasks), are especially vulnerable to these attacks. (2) We developed a method that strengthens these models by training only the parts that matter most, making them harder to fool while keeping their performance high. (3) This will help improve the reliability of AI vision systems by making them more resilient to attacks, enabling safer deployment in real-world settings.

Primary Area: Deep Learning->Robustness

Keywords: Computer Vision, Adversarial robustness, Criticality, Computer vision, Adversarial attacks, Defense, Safety

Submission Number: 7296

Loading