Keywords: test-time adaptation
Abstract: Test-Time Adaptation (TTA) has recently emerged as a promising strategy that allows the adaptation of pre-trained models to changing data distributions at deployment time, without access to any labels. To address the error accumulation problem, various approaches have used the teacher-student framework. In this work, we challenge the common strategy of setting the teacher weights to be an exponential moving average of the student by showing that error accumulation still occurs, but only on longer sequences compared to those commonly utilized. We analyze the stability-plasticity trade-off within the teacher-student framework and propose to use an intransigent teacher instead. We show that not changing any of the weights of the teacher model within existing TTA methods allows them to significantly improve their performance on multiple datasets with longer scenarios and smaller batch sizes. Finally, we show that the proposed changes are applicable to different architectures and are more robust to changes in hyper-parameters.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3061
Loading