\section{Related works}
\subsection{Recurrent structure}
Recurrent structures have found widespread application in various research fields, including computer vision (CV) and natural language processing (NLP). For instance, in stereo matching, a recurrent hourglass network is proposed by~\cite{9506870} to effectively capture multilevel information. This strategy enhances the adaptability of the network to complex scenarios by leveraging comprehensive global and local features. In the realm of recommendation systems, DRIN~\citep{2022DRIN} employs a recurrent neural network (RNN) with 1$\times$1 convolution kernels to recurrently model the second-order feature interactions between a raw feature and the current feature. However, DRIN utilizes only a single vanilla RNN branch, overlooking the potential benefits of a two-stream structure with varying stacking depths for simultaneously capturing both global and local feature representations. In contrast, our two-stream MSR structure is equipped with a self-attention mechanism, attenuation units, and an acceleration method. This design fundamentally enhances the capacity of the network to efficiently capture multilevel high-order features.


\subsection{Spurious correlation in recommendation tasks}
In the field of causal inference, causality refers to the cause-effect relationship between variables, where changes in one variable trigger changes in another variable. The objective is to identify these relationships. Conversely, a spurious correlation refers to an observed association that does not necessarily indicate a genuine causal connection, which often occurs due to the presence of confounding variables or selection bias~\citep{spirtes2013causal}. 
\cite{dou2022decorrelate} utilized feature decorrelation to promote feature independence, thereby mitigating spurious textual correlations for natural language understanding. \cite{zhang2021deep} employed sample reweighting for variable decorrelation tasks to identify stable features for image classification. 
In recommendation tasks, \cite{li2022causal} incorporated causal feature selection into factorization machines (FMs) to address the issue of confounding factors and selection bias, thereby improving the robustness of the output recommendations. They learned the weights associated with each first-order and second-order feature to explicitly select causal features. However, not all causal features are limited to these orders, and the employed optimization method hinders efficiency, particularly when handling many features. In contrast, our proposed method not only learns the weight of every instance to select causal features and feature interactions with arbitrary orders but also boasts an efficient structure with a rapid inference speed, providing a more comprehensive approach.



