Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: test-time adaptation, domain adaptation, domain shift, test-time distribution shift
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This work presents a novel test-time adaptation method for correcting predictions caused by label distribution shifts, considering class-wise confusion patterns.
Abstract: Test-time adaptation (TTA) enables models to adapt to test domains using only unlabeled test data, addressing the challenge of distribution shift during test time. However, existing TTA methods mainly focus on input distribution shifts, often neglecting class distribution shifts. In this work, we first reveal that existing methods can suffer from performance degradation when encountering class distribution shifts. We also show that there exist class-wise confusion patterns observed across different input distribution shifts. Based on these observations, we introduce a novel test-time adaptation method, named Distribution shift-Aware prediction Refinement for Test-time adaptation (DART), which refines the predictions made by the trained classifiers by focusing on class-wise confusion patterns. DART trains a distribution shift-aware module during intermediate time by exposing several batches with diverse class distributions using the training dataset. This module is then used during test time to detect and correct class distribution shifts, significantly improving pseudo-label accuracy for test data. This improvement leads to enhanced performance in existing TTA methods, making DART a valuable plug-in tool. Extensive experiments on CIFAR, PACS, ImageNet, and digit classification benchmarks demonstrate DART's ability to correct inaccurate predictions caused by test-time distribution shifts, resulting in significant performance gains for TTA methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2344
Loading