# Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models


## Abstract
Recent vision-language models (VLMs), such as CLIP, have demonstrated remarkable transferability across a wide range of downstream tasks by effectively leveraging the joint text–image embedding space, even with only a few data samples. Despite their impressive performance, these models remain vulnerable to adversarial attacks, raising significant concerns about their security and reliability in practical deployments. To address this issue, we propose Adversarial Mask Tuning (AdvMask), a method that effectively enhances the robustness of VLMs without directly modifying their pre-trained weights. Instead, our AdvMask learns a set of binary masks that selectively deactivate model parameters vulnerable to adversarial perturbations. By identifying robust neural pathways within the vision encoder, AdvMask facilitates the generation of features and predictions that are resistant to adversarial attacks. Furthermore, we introduce a Layer-wise Adaptive Feature Alignment (LAFA) loss, specifically designed to optimize AdvMask in few-shot scenarios. The LAFA loss adaptively aligns intermediate-layer features from clean and adversarial samples across each transformer block, enhancing the representational robustness of the model. Experimental results across multiple benchmarks confirm that our AdvMask approach substantially outperforms existing adversarial tuning techniques for VLMs, especially in few-shot settings.


## Setup

This work is built upon the **FAP**, **Dassl** framework. You can refer to instructions from [FAP](https://github.com/lionel-w2/FAP.git) to set up the [Dassl](https://github.com/KaiyangZhou/Dassl.pytorch#installation).



## How to Run our Codes

Before running, configure the `output_dir` and `data` path parameters in each shell script under the `scripts` folder to match your own output and dataset directories.
Then, by using the provided shell file (i.e., train_vit_b32_TeCoA_align_ada_amp_htune_LR_EP_base.sh) in the script folder, you can run our AdvMask with varying hyperparameters (e.g., learning rate, max_epochs, lambda, etc.)


```bash
CUDA_VISIBLE_DEVICES=0 bash scripts/AdvMask/train_vit_b32_TeCoA_align_ada_amp_htune_LR_EP_base.sh 50.0 0.01 10 caltech101
```
