\section{Discussion and future work}
In this study, we adopted the RFM---a novel data-adaptive, feature learning kernel---for uncertainty quantification through integration into GPs.
We rigorously tested our method across various datasets and metrics to ensure consistency. Our results demonstrate that our RFM-based GP can either outperform or match the performance of existing state-of-the-art methods, including boosting-based approaches such as NGBoost \citep{duan2020ngboost} and CatBoost-ensembles \citep{malinin2021uncertainty}. 

%We "bridge fields" with a method which "has not previously featured in the GP literature" [\Rone]. Here, we demonstrate that a non-diagonal metric as used by the RFM "is something that the community has been missing" 
In the GP literature, there is a focus on ARD-based approaches or low-rank feature matrices $\mM$ \citep{garnett2014active,letham2020re}. We show and provide examples illustrating that the presented GP-RFM with full feature matrix $\mM$ outperforms these approaches since it is able to reliably model relevant covariate correlation. We therefore bridge fields and demonstrate an approach that the GP community has been missing. 

However, our empirical findings suggest that RFMs might occasionally be surpassed by their diagonal version, RFM-diag or kernels with ARD. We observed that sample complexity plays a pivotal role in this behaviour. Given sufficient training samples, leveraging the capabilities of RFM is always preferable. However, in cases where the sample size is limited, the diagonal RFM can be preferable. While delving deeper into determining the optimal method for various settings is beyond the scope of this paper, it presents a crucial direction for future research.

Another line of future research is to integrate more intriguing kernels within the RFM framework. RFM is a broad feature learning framework based on kernels, suitable for any radial kernel. This study primarily concentrates on the two most prevalent kernels: RBF and Laplacian. The results clearly show that the Laplacian outperforms the Gaussian kernel. There is potential to select a task-specific kernel to further enhance these performances. For example, Neural Tangent Kernels (NTK) \citep{jacot2018neural} or Convolutional Neural Tangent Kernels (CNTK) \citep{li2019enhanced}.

Another crucial aspect is scalability. Decision tree-boosting methods are naturally adept at handling large datasets. On the other hand, kernel methods historically have faced challenges in scaling. However, with the advent of recent state-of-the-art techniques, scaling kernels has become feasible. Notable examples are the EigenPro series \citep{ma2017diving, ma2019kernel,pmlr-v202-abedsoltan23a,pmlr-v238-abedsoltan24a} and FALKON \citep{rudi2017falkon,meanti2020kernel}. These advancements can enable our method to scale effectively to large datasets.


% \hl{limit = 8 pages}