Keywords: Vision-Language models, Prompt Learning, Test-Time Adaptation
TL;DR: We propose SPLAT (Spike-and-sLab Prompt Adaptation at Test-time), a selective prompt adaptation framework that modulates each token’s update magnitude based on test-time uncertainty.
Abstract: Not all prompt tokens contribute equally to generalization under distribution shift. While test-time prompt tuning provides a lightweight approach to adapt vision-language models without retraining, most methods update all prompt tokens uniformly, without considering their individual uncertainty or relevance. We introduce SPLAT (Spike and SLab Prompt Adaptation at Test time), a selective adaptation framework that adjusts the update strength of each token based on its estimated uncertainty. SPLAT uses Monte Carlo Dropout to measure token-wise epistemic uncertainty and applies a gating function to scale gradient updates accordingly. This mechanism is grounded in a probabilistic interpretation of a spike-and-slab prior, allowing each token to be softly preserved or adapted. We further derive a variational learning objective that encourages stable adaptation while preserving pretrained knowledge. Experiments on ten cross-dataset and four domain zero-shot generalization benchmarks show that SPLAT not only improves accuracy over existing test-time prompt tuning methods but also reduces unnecessary updates and provides finer-grained, token-level control during adaptation, a capability absent in prior approaches.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12007
Loading