\section{Experiments}
\input{table/dataset}


\subsection{Datasets and evaluation metrics}
\label{module11}
We conduct extensive experiments on four official datasets and our production dataset. Table~\ref{statis of dataset} provides an overview of the statistical information. We utilize two evaluation metrics, the area under the ROC curve (AUC) and LogLoss (cross-entropy loss).
\\
\noindent\textbf{Criteo\footnote{http://labs.criteo.com}.}
Criteo is an online advertising and CTR prediction dataset that includes real-world displayed ads. 

\noindent\textbf{Avazu\footnote{https://www.kaggle.com}.}
Avazu is a widely used dataset in CTR prediction.
This dataset includes real-world advertising data, focusing on mobile ads and user demographics.

\noindent\textbf{Frappe\textcolor{blue}{\footnotemark[3]}.}
Frappe includes smartphone sensor data, which is popular in human activity recognition research.

\noindent\textbf{MovieLens\textcolor{blue}{\footnotemark[3]}.}
MovieLens is used in recommender systems and leverages its movie ratings data to predict the likelihood of a user clicking on a movie.
\footnotetext[3]{https://github.com/openbenchmark/BARS}


\noindent\textbf{User Game Content (UGC).}
The UGC dataset consists of clicking behavior data for user game content collected from our online service over one month. We collected clicking logs with anonymous user IDs, user behavior histories, game content features (e.g., categories), and context features (e.g., operation time).
\input{table/1_sota}




\subsubsection{Baseline}
\label{baseline}
To fairly and accurately verify the improvements achieved by the proposed MSR structure and SCE module, we use a two-stream MLP network as the ``Baseline'' method. The MLP sizes of the two branches are set to [400, 400, 400] and [800] as these values yield the best performance. This approach aligns with established best practices for two-stream CTR models~\citep{mao2023finalmlp}, providing a solid foundation for evaluating any modifications or novel approaches in a controlled manner.
To further verify the superiority of the proposed recurrent structure, we utilize a self-attention structure to replace the MLP in the ``Baseline'' model and denote it as "Baseline + SA". The implementations of these models can be found in our open-source code. 


\subsubsection{Implementation}
To conduct fair comparisons with the recently proposed cutting-edge methods, we use the released preprocessed datasets produced by~\cite{cheng2020adaptive} with the same splitting and preprocessing procedures. All the models and experiments are implemented based on the FuxiCTR toolkit~\citep{zhu2021open}. RE-SORT and the ``Baseline'' model follow the same experimental settings as those of FinalMLP~\citep{mao2023finalmlp}, with the batch size for all datasets set to 4,096, the embedding size set to 10 for the five datasets, and the learning rate set to 0.001 unless otherwise specified. The number of chunks $k$ in the CP is set to 50 for Criteo and UGC, 10 for Avazu and MoiveLens, and 1 for Frappe. The remaining hyperparameters are kept constant for the five datasets. To mitigate overfitting, we implement early stopping based on the AUC attained on the validation dataset. 
In the D-SR and S-SR, we define the attenuation coefficient $r$ as $[\frac{31}{32},\frac{63}{64},....,\frac{2^{M+5}-1}{2^{M+5}}]$. 





\subsection{Comparison with the SOTA methods}
We classify twenty competitive approaches into four categories: first-order, second-order, high-order, and ensemble methods. To ensure a fair comparison, we run each method 10 times with random seeds on a single GPU (NVIDIA A100) and report the average testing performance. Then, we conduct two-tailed t-tests~\citep{liu2019feature} to compare the performance. As shown in Table~\ref{tab1}, on all four datasets, RE-SORT outperforms all the other models in terms of both AUC and LogLoss, which validates its generalization ability. The greatest improvement is observed for the Avazu and Frappe, where RE-SORT exhibits a relative improvement of 0.21\% over the second-best-performing GDCN. 
Previous two-stream methods such as FinalMLP and GDCN exhibit robust performance, while our RE-SORT constructs more effective multilevel features while eliminating spurious correlations and achieves the best performance. 
%
It is crucial to emphasize the significance of accurate CTR prediction, particularly in applications with substantial user bases; even a 0.1\% AUC increase, although seemingly modest, can have a substantial impact, significantly boosting overall revenue~\citep{cheng2016wide,wang2017deep,2017Model}\textcolor{blue}{\footnotemark[4]}.
\footnotetext[4]{The GDCN yields a 0.17\% improvement over the previous SOTA approach on Criteo. With its stacked structure, GDCN achieves a 0.04\% improvement over the baseline. }
%
Moreover, compared to the previous SOTA model (GDCN), RE-SORT is almost 4 times faster, highlighting an additional valuable improvement, which is explained in detail in the following section. 

\input{table/speed}

\input{table/2_Ablation_study}
\noindent\textbf{Speed comparison.} 
\label{speed exp}
The rapid inference speed of the RE-SORT is attributed to its MSR structure. The SCE module does not affect the inference speed because it is used only during training. We choose ``Baseline''; ``Baseline + SA''; and the cutting-edge two-stream GDCN, FinalMLP, LightDIL, and DRIN methods for comparison. Table~\ref{speed} demonstrates that our RE-SORT substantially outperforms the previous SOTA methods; it outperforms DRIN, the fastest among these methods, by approximately 90\%.



\subsection{Ablation study}
\label{module22}
We verify the effectiveness of the MSR and the SCE structures through extensive experiments.




\noindent\textbf{Improvement achieved with the MSR and SCE.} 
\label{ablation_exp}
Table~\ref{ablation} shows that MSR and SCE stably improve the performance of the model, exceeding the ``Baseline'' by an average of 0.25\% in terms of the AUC across all three datasets. To fairly compare the performances of various two-stream structures, the results presented in the table are the best experimental results obtained with different depth combinations for the two streams within their corresponding network structures. Our MSR outperforms the two-stream MLP and two-stream SA structures by an average of 0.12\% and 0.11\% AUC, respectively. Tables~\ref{speed} and~\ref{ablation} demonstrate that the proposed MSR structure outperforms MLP and attention-based models in terms of both accuracy and computational efficiency. 


\input{img/depth}
\noindent\textbf{Two streams in MSR with different depths.}
\label{depth_exp}
We verify the performance of RE-SORT with different combinations of the S-SR and D-SR streams at various depths. On the five datasets, the best performance is obtained with D-SR and S-SR depths of 3 and 1, respectively. Figure~\ref{1} shows a visualization example based on the MovieLens dataset. When the D-SR and S-SR depths are the same (1, 2, and 3), the AUC performance is similar to that of the ``Baseline'' (0.9712, 0.9713, and 0.9711 vs. 0.9711), which shows that the MSR structure is not able to construct multilevel features with the same stacking depth to achieve improved performance. When the depths of the D-SR and S-SR streams are set to 3 and 1, respectively, the result is 0.12\% greater than that of the ``Baseline''. 



\input{img/decor}
\noindent\textbf{Feature visualization for SCE.}
We randomly select variables with 50 dimensions from input features of the CTR predictor learned by ``Baseline'' and SCE. Figure~\ref{decorrelation} shows that the correlations among the features are eliminated by SCE, while the model performance is notably improved. To a certain extent, this experiment substantiates that SCE can promote the model to identify critical features and acquire more effective associations with CTR predictions, thus enhancing the generalization ability. Additional experiments, such as a comparative analysis of using different numbers of random Fourier spaces, are reported in the appendix. 