Test-Time Visual Concept Anchoring via Entropic Optimal Transport

Pawan Kumar

Test-Time Visual Concept Anchoring via Entropic Optimal Transport

Pawan Kumar

Published: 24 Apr 2026, Last Modified: 04 Jun 2026VisCon 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: tes time, visual concepts

Abstract: Large vision-language models such as CLIP are widely deployed under conditions that differ from pre-training, causing visual patch tokens to drift from the semantic regions expected by the text-aligned head. We propose *test-time concept anchoring* (TTCA), a training-free module that treats the visual tokens of a test image as a source measure and a task-conditioned bank of text concepts as a target measure, then solves an entropic optimal transport problem to softly project selected tokens toward semantic anchors before the downstream head consumes them. TTCA operates per sample, requires no backpropagation, and admits an unbalanced variant with a reject sink for open-set noise. On CLIP ViT-B/16, TTCA improves zero-shot accuracy on CIFAR-100 by $1.0$\%, improves mean accuracy across 9 corruption types (89\% of individual conditions improved), reduces distractor-induced accuracy degradation by 41\%, and improves CIFAR-100 calibration all at roughly 4 ms per image with **no** model parameter changes.

Submission Number: 40

Loading