%!TEX root = ../sublime-text.tex

\begin{abstract}
Invariant Risk Minimization (IRM) \citep{arjovsky2020invariant} proposes an optimization scheme that uses causal features to improve generalization.
However, in most realizations, it does not have an explicit feature selection strategy. Prior investigation \citep{rosenfeld2020risks,zhang2023missing} reveals failure cases when searching for causal features, and in light of these concerns, recent work has demonstrated the promise of using sparsity \citep{zhouSparseInvariantRisk2022, fan2024eills} in IRM, and we make two specific contributions on that theme. 
First, for the original sparse IRM formulation, we present the first correct non-asymptotic analysis of the effectiveness of sparsity for selecting invariant features. 
% \jdcomment{Just added qualifier about loss difference}
%With an %information theoretic  analysis of a standard generative model for IRM based on $L_0$-constraints for feature sparsity, we show that 
We show that sparse IRM with $L_0$ constraints can select invariant features and ignore spurious and random features. 
We show that sample complexity depends
polynomially on the number of invariant features and otherwise logarithmically on the ambient dimensionality. 
%which includes spurious and random features. 
%However, this existing work is computationally demanding, does not come with estimation error guarantees, and contains an error which we correct. 
%In this paper, we present two results towards efficient sparse invariant feature selection. 
%First, by refining the development in \citep{zhouSparseInvariantRisk2022}, we show that
%a variant of standard IRM with a $L_0$ constraint
% information-theoretic result (i.e., with no regard for computation) 
%can correctly recover the invariant features, ignoring all spurious and random features. 
Second, we present the first invariant feature recovery guarantees with a computationally-efficient implementation of such sparse IRM based on iterative hard-thresholding.
Prior methods are limited to combinatorially searching over the space of all sparse models, but we present a different loss function. We show this new optimization implies recovery of invariant features under standard assumptions. 
We present empirical results on standard benchmark datasets to demonstrate the effectiveness and efficiency of the proposed sparse IRM models.

\end{abstract}