NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models

Jiaming Zhang, Xin Wang, Xingjun Ma, Lingyu Qiu, Yu-Gang Jiang, Jitao Sang

Published: 01 Jan 2026, Last Modified: 01 Feb 2026IEEE Transactions on Pattern Analysis and Machine IntelligenceEveryoneRevisionsCC BY-SA 4.0

Abstract: Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capabilities in understanding relationships between visual and textual data through joint embedding spaces. Despite their effectiveness, these models remain vulnerable to adversarial attacks, particularly in the image modality, posing significant security concerns. Building upon our previous work on Adversarial Prompt Tuning (AdvPT), which introduced learnable text prompts to enhance adversarial robustness in VLMs without extensive parameter training, we present a significant extension by introducing the Neural Augmentor framework for Multi-modal Adversarial Prompt Tuning (NAP-Tuning). As a significant extension, NAP-Tuning first establishes a comprehensive multi-modal (text and visual) and multi-layer prompting framework. The core of this framework is a targeted structural augmentation for feature-level purification, implemented through our Neural Augmentor approach. This framework implements feature purification by incorporating TokenRefiners-lightweight neural modules that learn to reconstruct purified features via residual connections-to directly address distortions in the feature space. This structural intervention is what enables the multi-modal and multi-layer system to effectively perform modality-specific and layer-specific feature rectification. Comprehensive experiments demonstrate that NAP-Tuning significantly outperforms existing methods across various datasets and attack types. Notably, our approach shows significant improvements over the strongest baselines under the challenging AutoAttack benchmark, outperforming them by 32.3% on ViT-B16 and 31.3% on ViT-B32 architectures while maintaining competitive clean accuracy. This work highlights the efficacy of internal feature-level intervention in prompt tuning for adversarial robustness, moving beyond input-side alignment approaches to create an adaptive defense mechanism that can identify and rectify adversarial perturbations across embedding spaces.

External IDs:doi:10.1109/tpami.2026.3659598