Keywords: Out-of-distribution (OOD) robustness, OOD generalization, invariant data, domain generalization
Abstract: Scaling models on simple predictive objectives is often insufficient to overcome spurious correlations that degrade out-of-distribution generalization. While Domain Generalization (DG) methods aim to learn invariant representations--often based on causality principles--they can be computationally expensive and underperform simple Empirical Risk Minimization (ERM). We propose a data-centric alternative: Geometric Robustness via Invariant Training (GRIT). Instead of explicit causal modeling, GRIT enforces a geometric constraint during fine-tuning based on a small set of *noisy invariant pairs*, which implicity encode an invariance property. We provide the first finite-sample analysis of this setting, showing that our framework generalizes latent linear causal models. We prove GRIT achieves robust generalization that scales at a rate of $O(1/\sqrt{k})$ with the number of pairs $k$, offering an scalable alternative to ERM or explicit causal modeling for out-of-distribution robustness.
Submission Number: 18
Loading