\section{Prior work}




Numerous uncertainty quantification methods have been proposed in the literature for utilization with tabular data. Here, we focus on discussing flexible methods with state-of-the-art predictive performance.


\paragraph{Gaussian processes.}
As a non-parametric, flexible Bayesian regression model, the Gaussian process is a well-studied and natural choice for uncertainty quantification \citep{rasmussen2006gaussian}. The GP is characterized by a mean function and a kernel function as covariance. The crucial challenge is to choose the right kernel as it encodes high-level assumptions about the data. Commonly, the Radial Basis Function (RBF) or Laplace kernel is chosen, which has a limited number of parameters to optimize. For more flexibility, kernels with Automatic Relevance Determination (ARD) introduce covariate weighting through learnable parameters \citep{mackay1992bayesian,neal1996bayesian}. 
\citet{vivarelli1998discovering} generalizes the diagonal ARD weighting to general positive-definite weighting matrices or low-rank factorisations. \citet{garnett2014active} and \citet{letham2020re} use a factorized weighting matrix and approximate the posterior with Laplace approximation for active learning and Bayesian optimization. For the latter, sparse axis-aligned Subspace GPs leverage structural sparsity in the kernel \citep{eriksson2021high}.
Instead of utilizing advanced kernels, we can equivalently transform the input and use standard kernels \citep{mackay1998introduction}. Neural networks have been studied as feature extractors \citep{calandra2016manifold,wilson2016deep}, or where the last layer approximates a GP \citep{huang2015scalable,liu2020simple}. Our approach combines both strategies, leveraging the recently proposed Recursive Feature Machine \citep{radhakrishnan2022feature}, which introduces a novel feature-extracting kernel with the probabilistic expressivity of GPs.


\paragraph{Probabilistic boosting.}
Boosting-based approaches \citep{freund1995desicion,friedman2001greedy} allow for flexible models, which have found widespread application on tabular datasets \citep{shwartz2022tabular,grinsztajn2022tree,mcelfresh2023neural}. Such methods include among others AdaBoost, XGBoost, LightGBM or CatBoost \citep{chen2016xgboost,ke2017lightgbm,prokhorenkova2018catboost}. For classification problems, most methods have a natural probabilistic interpretation through estimated class probabilities. However, for regression problems, there is no such straightforward concept. Therefore, probabilistic extensions of boosting such as NGBoost, CatBoost-Ensembles \citep{duan2020ngboost,malinin2021uncertainty} or extensions to Random Forests \citep{schlosser2019distributional,shaker2020aleatoric} have been proposed. Notably, when comparing the performance of probabilistic boosting approaches against our GP-RFM, our approach performs on par or even outperforms them across a range of evaluation metrics and tabular regression datasets.%\looseness=-1


\paragraph{Neural networks.}
The ability to learn features from data is a key advantage of the predictive power of neural networks (NN). For uncertainty quantification, Bayesian NNs \citep{mackay1992bayesian,neal1996bayesian} are a natural choice. However, the need for approximate inference methods such as variational inference \citep{graves2011practical,blundell2015weight} or Markov Chain Monte Carlo \citep{welling2011bayesian} makes them computationally expensive. Conversely, the use of Monte Carlo dropout \citep{gal2016dropout} provides less reliable uncertainty estimates \citep{ovadia2019can,gustafsson2020evaluating} than ensembles of NNs \citep{lakshminarayanan2017simple}. Although deep ensembles set the gold standard for NNs, they necessitate training multiple NNs resulting in high computational and memory burden. We leverage the idea of feature learning in NN through the use of RFMs since the learnt features in the latter are intricately linked to features learnt in feedforward NNs \citep{radhakrishnan2022feature}. 

