Keywords: Domain generalization, data augmentation
Abstract: Many machine learning systems deployed in the real world face the challenge of domain generalization, or generalizing to new domains that have different data distributions. For example, in wildlife conservation, animal classification models can perform poorly on new camera deployments. Across cameras, the data distribution changes along multiple factors, some of which are spurious (e.g., low-level background variations) and others of which are robustly predictive (e.g., habitat type). In this work, we aim to improve out-of-distribution performance by learning models that are invariant to spurious cross-domain variations while preserving predictive cross-domain variations. Specifically, we explore targeted augmentations that rely on prior knowledge to randomize only the spurious cross-domain variations. On iWildCam2020-WILDS and Camelyon17-WILDS, two domain generalization datasets, targeted augmentations outperform the previous state-of-the-art by 3.2% and 14.4% points respectively, suggesting that targeting spurious cross-domain variations using prior knowledge can be an effective route to out-of-distribution robustness.