\subsection{The theory behind sample reweighting}
\label{sample reweighting}
In the following section, we denote the covariate features as $\text{FM}$=[$\text{FM}_\text{rob}$, $\text{FM}_\text{fal}$], where $\text{FM}_\text{rob}$ denotes the robust features, and $\text{FM}_\text{fal} = \text{FM}/\text{FM}_\text{rob}$ denotes spurious correlation features.
$g(\cdot)$ is a nonlinear function learned by the model with $\text{FM}_\text{rob}$ and $\text{FM}_\text{fal}$. 
$\beta_{\text{FM}_\text{rob}}$ and $\beta_{\text{FM}_\text{fal}}$ denote the linear weights of $\text{FM}_\text{rob}$ and $\text{FM}_\text{fal}$. 
\cite{he2023covariate} demonstrate that $\beta_{\text{FM}_\text{fal}}$ asymptotically exceeds 0, implying that models without the SCE module inevitably suffer from the impact of spurious correlation features. 
Additionally, the correlation between $\text{FM}_\text{rob}$ and $g(\text{FM}_\text{rob})$ exhibits less variation than the correlation between $\text{FM}_\text{fal}$ and $g(\text{FM}_\text{rob})$ when exposed to parameter variance. 
Even when sample reweighting is applied randomly, the estimate of $\beta_{\text{FM}_\text{rob}}$ is more robust and less variable than the estimate of $\beta_{\text{FM}_\text{fal}}$.
%
When predicting $y_i$, $y_i$ and $\text{FM}_\text{fal}$ must be statistically independent of the minimal and optimal predictor $\text{FM}_\text{rob}$: $y_i \bot \text{FM}_\text{fal} |\text{FM}_\text{rob}$. 
% TODO 缩减
% It has been proven that $\text{FM}_\text{rob}$ is the minimal and optimal predictor if and only if $\text{FM}_\text{rob}$ is the minimal robust feature set~\citep{xu2022theoretical}. 
\cite{xu2022theoretical} prove that $\text{FM}_\text{rob}$ is the minimal and optimal predictor if and only if $\text{FM}_\text{rob}$ is the minimal robust feature set.
%
Therefore, we intend to capture the minimal robust feature set $\text{FM}_\text{rob}$ to obtain robust prediction results. We denote the robust feature set as the minimal robust feature set for simplicity.
Let $\mathcal{W}$ be a set of sample weighting functions. Our objective is to acquire $\mathcal{W}_{\bot}$, a subset of $\mathcal{W}$ in which the features in $\text{FM}$ are mutually independent.
\cite{zhou2022model} prove that there exists a weight function $w \in \mathcal W$ that makes the spurious correlations independent of $y_i$ for linear models. Moreover, \cite{2021Why} prove that whether the data generation process is linear or nonlinear, conducting a weighted least squares (WLS) operation using the weighting function in $\mathcal{W}_{\bot}$ can lead to the selection of perfect features. Therefore, we transform the task from finding minimal robust features to obtaining an independent FM by sample reweighting.
