\section{Introduction}\label{sec:intro}
First, we briefly review the set-up of bayesian quadrature \citep{fx_quadrature}. The goal of Bayesian quadrature is to estimate the integral of function $f: \calY \to \R$ which does not have a closed form expression.
\begin{align}
    \Pi[g] = \int g d\pi
\end{align}
where $\pi$ is a probability measure.

The standard Monte Carlo algorithm proposes to sample $n$ times $\{\y_i\}_{i=1}^n$ and the estimate is the mean of the function values at these $n$ samples.

\begin{align}\label{eq:mc}
    \hat{\Pi}_{MC}[g] = \frac{1}{n} \sum_{i=1}^n g(\y_i) 
\end{align}

In contrast, Bayesian Monte Carlo algorithm gives an estimate in the form of weighted average of $n$ function values, while in standard Monte Carlo, the weights $w_i$ are considered equal.

\begin{align}\label{eq:bmc}
    \hat{\Pi}_{BMC}[g] = \sum_{i=1}^n w_i g(\y_i) 
\end{align}

\subsection{Vector Valued RKHS}

We have a probability space $(\Omega, \calF, \mathbb{P})$, then we define two random variables $X: \Omega \to \calX$ and $Y: \Omega \to \calY$. We also define a RKHS $\calH_\calY$ with inner product $\langle \cdot, \cdot \rangle_{\calH_\calY}$ and kernel $K_\calY: \calY \times \calY \to \R$. $\calL(\calH_\calY)$ is the set of all bounded linear operators from $\calH_\calY$ to $\calH_\calY$. We use $x, y$ to denote elements from set $\calX$ and $\calY$, and we use $g_x: \calX \to \R$ and $g_y: \calY \to \R$ to denote functions of  $\R^\calX, \R^\calY$. Specifically, $g_y \in \calH_\calY \subset \R^\calY$.

We also define a vector-valued RKHS $\calH_\Gamma$ which contain functions $f: \calX \to \calH_\calY$ under the reproducing kernel $k_\Gamma: \calX \times \calX \to \calL(\calH_\calY)$ i.e. a vector valued reproducing kernel. The motivation of $\calH_\Gamma$ is useful for understanding, so I put it here. For $x \in \calX, g_y \in \calH_\calY$ and $f: \calX \to \calH_\calY$, the inner product $\langle g_y, f(x) \rangle_{\calH_\calY}$ takes scalar values so it can be regarded as a functional. So, according to the Riesz representer theorem, there exists $K_{\Gamma x} (g_y)$ such that:

\begin{align*}
    \PSi{g_y, f(x)}{\calH_\calY} = \PSi{K_{\Gamma x} (g_y), f}{\calH_\Gamma}
\end{align*}

where $K_{\Gamma x}: \calH_\calY \to \calH_\Gamma$ is a linear operator. Intuitively, $K_{\Gamma x}$ is the feature map. Formally, the kernel operator $k_\Gamma(x,x'): \calH_\calY \to \calH_\calY$ is defined so that the reproducing property holds.

\begin{align}\label{eq:reproducing_1}
\begin{split}
    \PSi{K_{\Gamma x}(g_y), K_{\Gamma x'} (g_{y'})}{\calH_\Gamma} &= \PSi{g_y, K_{\Gamma x'} (g_{y'})(x)}{\calH_\calY} = \PSi{g_y, k_\Gamma(x, x')(g_{y'})}{\calH_\calY} \\
    &= \PSi{K_{\Gamma x} (g_y)(x'), g_{y'}}{\calH_\calY} = \PSi{k_\Gamma(x', x)(g_y), g_{y'}}{\calH_\calY}
\end{split}
\end{align}
So $k_\Gamma(x, x')$ is the Hilbert joint for $k_\Gamma(x', x)$.



