Structured Semantics Meet Uncertain Visuals: A Unified Approach to Calibrated Test-Time Prompt Tuning
Abstract: Large vision-language models (VLMs) generalize well zero-shot but become overconfident
and poorly calibrated under distribution shifts. Existing test-time adaptation (TTA) meth-
ods largely apply uniform entropy minimization with fixed geometric regularizers, ignor-
ing instance-wise uncertainty and domain-specific visual cues. We propose Uncertainty-
Calibrated Test-Time Prompt Tuning (UC-TPT), a label-free TTA framework tar-
geted at improving reliability rather than solely maximizing accuracy. UC-TPT consists of
three theoretically motivated components: (i) lightweight visual-to-text conditioning that
injects shallow visual statistics—where shift is most pronounced—into prompts, yielding
domain-conditioned predictions; (ii) an uncertainty-tempered entropy objective that adap-
tively controls distribution sharpness to curb overconfidence; and (iii) a topology-aware
prompt regularizer that approximately preserves the pairwise semantic relations of manual
prompts, stabilizing adaptation in the pretrained embedding space. Experiments on CLIP
and BiomedCLIP across diverse benchmarks demonstrate that UC-TPT consistently out-
performs existing methods in calibration robustness, yielding significant reductions in
Expected Calibration Error (ECE) across a wide range of distribution shifts while maintaining
competitive classification accuracy.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Tim_Georg_Johann_Rudner1
Submission Number: 8518
Loading