LookSharp: Attention Entropy Minimization for Test-Time Adaptation

Published: 28 Feb 2026, Last Modified: 04 Apr 2026CAO PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: test-time adaptation, computer vision, domain adaptation
TL;DR: We minimize the entropy of CLS-to-patch attention encouraging the model to maintain focused attention on shifted data.
Abstract: Test-time adaptation (TTA) updates models during inference to reduce error on distribution shifts. While entropy minimization over the output distribution has proven effective as a TTA loss, we study using the intermediate distributions computed by transformers in the attention mechanism. We propose $\textit{LookSharp}$, which minimizes the entropy of CLS-to-patch attention in the final layer as a novel TTA objective, encouraging the model to maintain focused attention on shifted data. We demonstrate that attention entropy minimization improves robustness on ImageNet-C. We also show that it is complementary to output entropy minimization and maintains performance on clean data.
Submission Number: 58
Loading