# SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but remain highly fragile under distribution shifts and adversarial perturbations. Recent test-time adaptation defenses improve robustness by leveraging many augmented views, but this leads to impractical slowdown and a clear robustness-throughput trade-off. To address this challenge, we present Stability and Suitability-guided Test-time Prompt Tuning (SS-TPT), evaluating the quality of each augmented view via two complementary scores: (1) stability, measuring prediction invariance to weak augmentations, and (2) suitability, measuring feature-space density among views. These stability and suitability (SS) scores guide both adaptation and inference through an SS-guided consistency loss and an SS-weighted ensemble, selectively amplifying trustworthy views while suppressing corrupted ones. Extensive experiments show that SS-TPT dramatically outperforms prior state-of-the-art methods, achieving superior robustness-throughput trade-offs under a single hyperparameter setting across diverse datasets and varying numbers of views, demonstrating both strong practicality and generality. 

To foster reproducibility and future research, our SS-TPT code, provided in the Supplementary Material, will be publicly available on GitHub.