Better Practices for Domain AdaptationDownload PDF

Published: 16 May 2023, Last Modified: 08 Sept 2023AutoML 2023 MainTrackReaders: Everyone
Keywords: domain adaptation, test-time adaptation, hyperparameter optimization, model selection
TL;DR: We devise a more rigorous domain adaptation framework for UDA, SFDA and TTA, covering better practices for data, training, hyperparameter optimization, model selection and validation.
Abstract: Distribution shifts are all too common in real-world applications of machine learning. Domain adaptation (DA) aims to address this by providing various frameworks for adapting models to the deployment data without using labels. However, the domain shift scenario raises a second more subtle challenge: the difficulty of performing hyperparameter optimisation (HPO) for these adaptation algorithms without access to a labelled validation set. The unclear validation protocol for DA has led to bad practices in the literature, such as performing HPO using the target test labels when, in real-world scenarios, they are not available. This has resulted in over-optimism about DA research progress compared to reality. In this paper, we analyse the state of DA when using good evaluation practice, by benchmarking a suite of candidate validation criteria and using them to assess popular adaptation algorithms. We show that there are challenges across all three branches of domain adaptation methodology including Unsupervised Domain Adaptation (UDA), Source-Free Domain Adaptation (SFDA), and Test Time Adaptation (TTA). While the results show that realistically achievable performance is often worse than expected, they also show that using proper validation splits is beneficial, as well as showing that some previously unexplored validation metrics provide the best options to date. Altogether, our improved practices covering data, training, validation and hyperparameter optimisation form a new rigorous pipeline to improve benchmarking, and hence research progress, within this important field going forward.
Submission Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Reviewers: Yes
CPU Hours: 500
GPU Hours: 2440
TPU Hours: 0
Evaluation Metrics: No
Estimated CO2e Footprint: 316.22
16 Replies