\section{Related work}
\label{sec:related_work}
%
In this section, we review related work on function-space VI with neural networks, and on approximating functions-space measures with weight-space priors.
%
\paragraph{Function-space inference with neural networks.}
Prior work on function-space VI in BNNs has addressed issues (i) intractable variational posterior in function space and~(ii) intractable KL divergence discussed in Section~\ref{sec:fsvi_in_bnns}.
\citet{sun2018functional} address~(i) by using implicit score function estimators, and~(ii) by replacing the supremum with an expectation.
\citet{rudner2022fsvi} address~(i) by using a linearized BNN \citep{khan2020approximate, immer2021linlaplace, maddox2021FastAdapt}, and~(ii) by replacing the supremum with a maximum over a finite set.
Other work abandons approximating the neural network's posterior and instead uses a BNN to specify a prior \citep{ma2019variational}, or deterministic neural networks as features for Bayesian linear regression \citep{ma2021funcVIspg} or the mean of a generalized sparse GP \citep{wild2022gvi}.
Unlike our more expressive GP posterior covariance, \citet{wild2022gvi} uses a simple stationary sparse GP posterior covariance (\cref{tab:classification}) which has higher sampling cost and can lead to model misspecification (\cref{fig:gfsvi_vs_gwi_periodic}).
Our work combines linearized BNNs with generalized VI, but we use the regularized KL divergence \citep{quang2019regularizedKL}, which naturally generalizes the KL divergence and allows for informative GP priors.
%
\paragraph{Approximating function-space measures with weight-space priors.}
%
\citet{FlamShepherd2017MappingGP,tran2022NeedGoodfuncpriorforBNNs} minimize a divergence between the BNN's prior predictive and a GP before performing inference on weights, while \citet{wu2023indirect} directly incorporate the bridging divergence inside the inference objective. 
Alternatively, \citet{pearce2020GPpriorsBNN} derive BNN architectures mirroring GPs, and \citet{matsubara2022ridgelet} use the Ridgelet transform to design weight-spaces priors approximating a GP in function space.
Similarly, \citet{rudner2023FuncReg} and \citet{sam2024bayesianneuralnetworksdomain} use empirical weight-space priors to regularize in function space and encode domain knowledge specified via a loss function, respectively. \citet{yang2020outputconstrainedBNN} instead imposes functional constraints directly via the prior.