\section{Introduction}
%
\begin{figure*}[t]
    \centering
    \resizebox{\linewidth}{!}{
    \includegraphics[width=\linewidth]{plots/fsvi_prior_varying_smoothness_periodic.pdf}
    }
    \caption{
    Inference with our GFSVI on synthetic data (gray circles) with Gaussian process priors encoding different properties such as smoothness (increasing from Matérn-1/2 to RBF) and periodicity (last panel). 
    }
    \label{fig:fsvi_prior_varying_smoothness}
\end{figure*}
%
Neural networks have shown impressive results in many fields but fail to provide well-calibrated uncertainty estimates, which are essential in applications associated with risk, such as healthcare \citep{medicineBDL} or finance \citep{analystRecommendationBML}.
Bayesian neural networks (BNNs) offer to combine the scalability and predictive performance of neural networks with principled uncertainty modeling by explicitly capturing epistemic uncertainty, which results from finite training data.
While the choice of prior strongly affects posterior uncertainties, specifying informative priors on BNN weights has proven difficult and is hypothesized to have limited their practical applicability \citep{knoblauch2019generalized,tran2022NeedGoodfuncpriorforBNNs}.
For instance, the default isotropic Gaussian prior, which is often chosen for tractability rather than for the beliefs it carries \citep{knoblauch2019generalized}, is known to have pathological behavior in some cases \citep{cinquin2021pathologies,tran2022NeedGoodfuncpriorforBNNs}.
A promising approach to solve this issue is to place priors directly on the function represented by the BNN instead of the weights.
Function-space priors allow incorporating interpretable knowledge, for instance using the Gaussian Process (GP) literature to improve prior design and selection \citep{williams2006gaussian}. 

A recent line of work has focused on using function-space priors in BNNs with variational inference (VI) \citep{sun2018functional}. 
VI is appealing because of its successful application to BNNs, its flexibility in terms of approximate posterior parameterization, and its scalability to large datasets and models \citep{hoffman13SVI,blundell2015weight}.    
Unfortunately, for BNNs with function-space priors, the Kullbach-Leibler (KL) divergence term in the VI objective (ELBO) involves two intractabilities: (i)~a supremum over infinitely many subsets and (ii)~access to the density of the distribution of the BNN's function, which has no closed-form expression. 
\citet{sun2018functional} propose to address problem~(i) by approximating the supremum in the KL divergence by an expectation, and problem~(ii) by using implicit score function estimators (which make this method difficult to use in practice \citep{ma2021funcVIspg}).  
However, the problem is actually more severe.
Not only is the KL divergence intractable, it is infinite in most cases of interest \citep{burt2020understanding}, such as when the prior is a non-degenerate GP or a BNN with a different architecture.
Thus, in these (and many more) situations, the KL divergence cannot even be approximated.
As a consequence, more recent work abandons using BNNs and instead uses deterministic neural networks to parameterize basis functions \citep{ma2021funcVIspg} or a GP mean \citep{wild2022gvi}.
The only prior work \citep{rudner2022fsvi} that overcomes the issue pointed out by \citet{burt2020understanding} does so by deliberately limiting itself to cases where the KL divergence is known to be finite (by defining the prior as the pushforward of a weight-space distribution).
Therefore, the method by \citet{rudner2022fsvi} suffers from the same issues regarding prior specification as other weight-space inference method.

In this paper, we address the argument by \citet{burt2020understanding} that VI does not provide a valid objective for inference in BNNs with genuine function-space priors, and we propose to apply the framework of generalized VI \citep{knoblauch2019generalized}.
We present a simple method for function-space inference with GP priors that builds on the regularized KL divergence~\citep{quang2019regularizedKL}, which generalizes the conventional KL divergence and is finite for any pair of Gaussian measures.
We obtain a Gaussian measure for the variational posterior by considering the linearized BNN from \citet{rudner2022fsvi}, and we are free to choose a function-space prior from a large set of GPs which have an associated Gaussian measure on the considered function space. 
While the regularized KL divergence is still intractable, it can be consistently estimated from samples with a known error bound.
We find that our method effectively incorporates the beliefs specified by GP priors (see \cref{fig:fsvi_prior_varying_smoothness}, discussed further in \cref{sec:experiments}) and that it yields competitive performance compared to BNN baselines.
To the best of our knowledge, our method is the first to provide a well-defined objective for function-space inference in BNNs with informative GP priors. 
Our contributions are summarized below:
\begin{enumerate}
    \item We use generalized VI with the \emph{regularized} KL divergence to mitigate the issue of an infinite KL divergence when using VI in BNNs with function-space priors.
    \item We present a new and well-defined objective for function-space inference in the linearized BNN with GP priors, resulting in a simple algorithm.
    \item We show that our method accurately captures structural properties specified by the GP prior and provides competitive uncertainty estimates for regression, classification, and out-of-distribution detection compared to baselines with both function- and weight-space priors.
\end{enumerate}
The paper is structured as follows: \cref{sec:background} introduces function-space VI and the regularized KL divergence; \cref{sec:methods} presents our method for generalized function-space VI (GFSVI) in BNNs; \cref{sec:experiments} reports experimental results; \cref{sec:related_work} discusses related work; and \cref{sec:discussion} concludes.