Abstract: Real-world data collected from multiple domains can have multiple, distinct distribution shifts over multiple attributes. However, state-of-the art advances in domain generalization (DG) algorithms focus only on specific shifts over a single attribute. We introduce datasets with multi-attribute distribution shifts and find that existing DG algorithms fail to generalize. Using causal graphs to characterize the different types of shifts, we show that each multi-attribute causal graph entails different constraints over observed variables, and therefore any algorithm based on a single, fixed independence constraint cannot work well across all shifts. We present Causally Adaptive Constraint Minimization (CACM), an algorithm for identifying the correct independence constraints for regularization. Experiments confirm our theoretical claim: correct independence constraints lead to the highest accuracy on unseen domains. Our results demonstrate the importance of modeling the causal relationships inherent in a data-generating process, without which it can be impossible to know the correct regularization constraints for a dataset.