\section{Introduction}

% Regression analyses have long been the bedrock of predictive modelling, where the primary goal is to forecast future outcomes based on historical data. In its conventional form, these predictions are often distilled down to singular, definitive values, which are derived from a set of influencing factors or covariates. However, as the applications of predictive modelling have grown more diverse and complex, especially in critical sectors like healthcare and weather forecasting, there is an increasing realization that a mere point prediction is not sufficient. Stakeholders in these sectors often need to gauge the level of confidence or uncertainty associated with these predictions.

Regression analysis traditionally predicts future outcomes by providing definitive values based on empirical data. However, as the applications of predictive modelling expand into critical areas like healthcare \citep{nicora2022evaluating,tran2021deep,avati2018improving} and weather forecasting \citep{forecast}, there is an increasing need to understand the confidence or uncertainty surrounding these predictions, beyond just point estimates. As stated in \citet{kompa2021second} {``\it medical ML should have the ability to say ``I don’t know'' and potentially abstain from providing a diagnosis or prediction when there is a large amount of uncertainty for a given patient}''. A rising number of publications underscore the importance of uncertainty quantification, evident in fields like radiology \citep{chua2023tackling}, digital pathology \citep{linmans2023predictive}, cancer digital histopathology \citep{dolezal2022uncertainty}, and radiation oncology \citep{barragan2022towards}, to name a few.
% \hl{Misha: hammer that uncertainty is important problem; more examples; review/survey papers}

The Recursive Feature Machine (RFM) \citep{radhakrishnan2022feature} represents an innovative data-adaptive kernel-based method, which provides a unique lens for data interpretation. Our research explores the capabilities of RFMs, focusing on their aptitude for uncertainty estimation in both in-distribution and out-of-distribution contexts. We pit our probabilistic RFMs against other prominent techniques, especially state-of-the-art probabilistic decision tree-based methods like NGBoost \citep{duan2020ngboost} and CatBoost-ensembles \citep{prokhorenkova2018catboost}, underscoring their competitive edge.

% %%%%%%%% NEW START %%%%%%%%
% \begin{figure}[t]
%     \centering
%     % \includegraphics{figuresTikz/method_gp_rfm}
%     % \tikzsetnextfilename{method_gp_rfm}
%     \input{tikz/method_gp_rfm}
%     \caption{Adopting the learned data-adaptive kernel ``feature matrix'' $\mM$ from the RFM within GPs to obtain predictive mean $f_\mM$ and covariance $\mSigma_\mM$.}
%     \label{fig:method-gp-rfm}
% \end{figure}
% %%%%%%%% NEW END %%%%%%%%

The Gaussian process (GP) is often the method of choice for estimating uncertainty in predictions \citep{rasmussen2006gaussian}, offering a sophisticated perspective beyond point estimates. However, with the ongoing evolution in machine learning, decision tree-based techniques such as NGBoost and CatBoost-ensembles are gaining traction. These methods not only challenge the GP in terms of prediction accuracy but have also showcased superior results in specific uncertainty metrics like Negative Log Likelihood~(NLL), coverage error~(CE) and prediction interval length~(IL), especially for tabular or categorical data. 

%%%%%%%% OLD START %%%%%%%%
In our study, we demonstrate that by combining GPs with the data-adaptive kernel derived from the RFM, we can bridge this performance gap, achieving results that are on par with or even surpass gradient-based boosting approaches. In summary, (i)~we introduced the RFM to the GP community and
(ii)~established that the performance of RFM is comparable to, or even superior to, existing state-of-the-art methods. More specifically, we have the following contributions:
%%%%%%%% OLD END %%%%%%%%
% %%%%%%%% NEW START %%%%%%%%
% Our study combines GPs with the data-adaptive kernel derived from the RFM as visualised in \Cref{fig:method-gp-rfm}. 
% Our resulting GP construction demonstrates that it can achieve results which are on par with or even surpass gradient-based boosting approaches, effectively bridging the existing performance gap.
% In summary, (i)~we introduced the RFM to the GP community and (ii)~established that the performance of RFM is comparable to, or even superior to, existing state-of-the-art methods. More specifically, we have the following contributions:
% %%%%%%%% NEW END %%%%%%%%
\begin{itemize}
    % RFM with superior performance for uncertainty quantification
    % \item \hl{from rebuttal: main contribution (1): Our resulting GP-RFM demonstrates itself as a "competitive alternative to state-of-the-art boosting-based methods."}
    % We bring RFMs to the GP community and illustrate that features derived from the RFM notably improve uncertainty performance on tabular datasets. Given their ability to produce results that are either comparable to or, in certain instances, surpass state-of-the-art methods, the RFM positions itself as the new benchmark for applications that demand precise uncertainty estimation. 
    % correlation of learnt features

    \item Our findings reveal that GP-RFM is a strong alternative to leading boosting-based techniques, particularly by enhancing uncertainty estimation in tabular datasets via features generated from the RFM. This capability to match or in certain instances exceed the performance of existing top-tier methods establishes the RFM as a new benchmark for applications that demand accurate uncertainty assessments.
    % \item \hl{from rebuttal: main contribution (2): We "bridge fields" with a method which "has not previously featured in the GP literature".}

    \item We bring RFMs to the GP community and illustrate that features derived from the RFM notably improve uncertainty performance on tabular datasets. Comparing the RFM with traditional GP techniques, we further show that the RFM can extract more general feature representations due to its ability to capture correlation between features. This can in turn significantly improve the resulting uncertainty estimates.

    
    % We establish a link between diagonal RFMs and conventional GP techniques, exploring their commonalities and distinctions. Although both methods are trained using distinct learning paradigms, our findings indicate a strong correlation in the features learned for certain datasets. However, this correlation is not consistent across all datasets, suggesting that these paradigms can result in different behaviours.
    
    % while the two learning paradigms are similar, they are not identical.
    % better features through feature correlation
    % \item \hl{from rebuttal: we demonstrate that a non-diagonal metric as used by the RFM "is something that the community has been missing".}
    % Comparing the RFM with traditional GP techniques, we further show that the RFM can extract more general feature representations due to its ability to capture correlation between features. This can in turn significantly improve the resulting uncertainty estimates.
    % ID and OOD results
    \item To highlight the robustness of the RFM we compare it on out-of-distribution data for label and covariate shift where the RFM surpasses other uncertainty quantification methods.
\end{itemize}

% \item We establish a connection between RFMs and traditional GP techniques, examining their similarities and differences. We demonstrated that RFM can extract feature representations that significantly improve uncertainty estimation. \daniel{maybe more in detail to highlight aspects that we do in the workshop paper?}
% \item We establish that RFM-based methods can be synergized with features derived from neural networks, achieving results that stand shoulder to shoulder with deep ensemble methods. This presents RFM as a viable alternative, especially in scenarios where the computational burden of training multiple deep networks is a concern.




% Emphasizing the importance of ongoing assessment and evolution in the domain, our research also delves into the intrinsic relationship between RFM and traditional GP.

% Additionally, we demonstrate how RFM can be integrated with the learned embeddings from Neural Networks, delivering uncertainty estimations comparable to those of NN ensembles. This presents a cost-effective alternative, circumventing the need to train multiple Neural Network models.

% But the world of machine learning is ever-evolving, and the recent introduction of Recursive Feature Machines (RFMs) has stirred interest among researchers and practitioners. RFMs promise to enhance the way we extract and interpret features from data, especially when using kernel methods, which are techniques used to find patterns in data. In this research, we embark on a deep dive into the capabilities and potential of RFMs. Our exploration spans their efficacy in refining predictions, their prowess in uncertainty estimations, and how they stack up against other leading techniques in the field. Furthermore, in an era where deep learning networks are revolutionizing various domains, we also investigate how RFMs can be seamlessly integrated with these advanced architectures. By doing so, we aim to paint a holistic picture of RFMs, shedding light on their potential to reshape the future of predictive modelling and uncertainty estimation.





% \subsection{Main contribution}
% In this work, we have the following contributions: