Abstract: We demonstrate that self-learning techniques like entropy minimization and pseudo-labeling are simple and effective at improving performance of a deployed computer vision model under systematic domain shifts. We conduct a wide range of large-scale experiments and show consistent improvements irrespective of the model architecture, the pre-training technique or the type of distribution shift. At the same time, self-learning is simple to use in practice because it does not require knowledge or access to the original training data or scheme, is robust to hyperparameter choices, is straight-forward to implement and requires only a few adaptation epochs. This makes self-learning techniques highly attractive for any practitioner who applies machine learning algorithms in the real world. We present state-of-the-art adaptation results on CIFAR10-C (8.5% error), ImageNet-C (22.0% mCE), ImageNet-R (17.4% error) and ImageNet-A (14.8% error), theoretically study the dynamics of self-supervised adaptation methods and propose a new classification dataset (ImageNet-D) which is challenging even with adaptation.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - [Reviewer qrwv] added more discussion on WILDS to the main part - [Reviewer qrwv] added discussion on the stability of the results across different seeds - [Reviewer qrwv] updated Tables 1+2 to show both RPL and ENT - [Reviewer 4Aye] updated related work section with a discussion on the proposed papers - [Reviewer 4Aye] added a comparison to Bartler et al. in Table 2, Table 10, and a detailed discussion in Appendix C.7. - [Reviewer 4Aye, iPbT] added a footnote to Sec. 1 and a sentence to Sec. 3 clearly defining self-learning as a superset of ENT, PL - [Reviewer iPbT] fixed typos, renamed “standard cross-entropy loss” with “softmax cross-entropy loss”, removed “contrastive” from “contrastive TTT” in section C.6. - [Reviewer iPbT] highlighted that TTA for ViTs was performed with the DINO objective only. - [Reviewer iPbT] attributed certain results classes to TENT/other papers in Sec. 6 - [Reviewer iPbT, 4Aye] added a paragraph and a new Table discussing adapting GN instead of BN layers to Sec.6. - [Reviewer iPbT] added error numbers on clean ImageNet to the ImageNet-D Table 11 in Sec.6 - [Reviewer iPbT] added suggested related papers to related work, provided more context for the different papers, overall significantly expanded upon related work - [Reviewer iPbT] introduced “self-learning” as a superset of different pseudo-labeling variants and entropy minimization. - [Reviewer iPbT] reworked Table 4 to include many more baselines, added a discussion on the baselines to Sec. 6. - [Reviewer iPbT] matched the architecture of the UDA-SS model in Table 2 - [Reviewer iPbT] discarded results without the mapping of permissible classes in Sec. D5. - [Reviewer iPbT] specified model architectures in Tables 8+9 for clarity - [Reviewer iPbT] added a new Table to summarize important hyperparameter choices and attribute them to previous work; added a paragraph discussing this Table to Sec. 6 - [Reviewer iPbT] moved the paragraph discussing adaptation parameters from Sec. 4 to Sec. 3. Added a paragraph discussing additional regularization to Sec. 3. - [Reviewer iPbT] changed the last sentence in the Conclusions to provide a more nuanced statement. - [Reviewer iPbT] reported adaptation results for a vanilla ResNet50 on ImageNet-A - [Reviewer iPbT] added a missing number for a ResNet18 in Table 2 for adaptation with ENT - [Reviewer 4Aye] added a paragraph discussing calibration results as well a new Table - [Reviewer 4Aye] added a paragraph discussing forgetting effects due to self-learning as well a new Table
Assigned Action Editor: ~Ekin_Dogus_Cubuk1
Submission Number: 319