everyone
since 06 Mar 2025">EveryoneRevisionsBibTeXCC BY 4.0
Peters et al. (2016) introduced the problem of invariant modeling. In this problem, we observe feature/outcome data from multiple environments and our goal is to identify a set of invariant features, those that maintain a stable predictive relationship with the outcome. Identifying such features is important for robust generalization to new environments and for uncovering causal mechanisms. While previous methods primarily tackle this problem through hypothesis testing or regularized optimization, we take a Bayesian approach. We develop a probabilistic model of multi-environment data where the indices of the invariant features are encoded as a latent variable. Under the data-generating assumptions as Peters et al. (2016), we show that posterior inference in our model targets the true invariant features. We prove that this posterior is consistent and we provide theoretical results about the posterior contraction rate. In particular, we show that, under a certain metric, greater heterogeneity among environments leads to a faster contraction of the posterior. When the number of features is large, we design an efficient variational inference algorithm to approximate the posterior. In both simulations and real-world data, we show that Bayesian invariance is more accurate and scalable than existing approaches.