Abstract: Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. Given the established test-time loss of entropy minimization over model predictions, we propose a new test-time loss of attention entropy minimization over the distributions computed by self-attention in the model. We propose $\textit{LookSharp}$ to minimize the entropy of the CLS-to-patch attention in the final layer of a ViT model and maintain focused attention on shifted data. We show that our attention entropy minimization improves robustness and is complementary to output entropy minimization on ImageNet-C and ImageNet-R.
Submission Number: 30
Loading