Keywords: Weight averaging, domain generalization, transfer learning, updatable machine learning
TL;DR: We propose a three-stage strategy for domain generalization; pre-training on a large corpus, fine-tuning on the available training domains and then editing by interpolating towards weights trained on auxiliary datasets.
Abstract: The goal of domain generalization is to train models that generalize well to unseen domains. To this end, the typical strategy is two-stage: first pre-training the network on a large corpus, then fine-tuning on the task's training domains. If the pre-training dataset is large enough, this pre-training is efficient because it will contain samples related to the unseen domains. Yet, large pre-training is costly and possible only for a few large companies. Rather than trying to cover all kinds of test distributions during pre-training, we propose to add a third stage: editing the featurizer after fine-tuning. To this end, we interpolate the featurizer with auxiliary featurizers trained on auxiliary datasets. This merging via weight averaging edits the main featurizer by including the features mechanisms learned on the auxiliary datasets. Empirically, we show that this editing strategy improves the performance of existing state-of-the-art models on the DomainBed benchmark by adapting the featurizer to the test domain. We hope to encourage updatable approaches beyond the direct transfer learning strategy.