\section{Bayesian Quadrature: Three Perspectives.}\label{sec:bq}
The selection of weight $\w_i$ in \eqref{eq:bmc} will be discussed in this section from three different perspectives. 

\subsection{The First Perspective: Gaussian Process Regression}
Assume a Gaussian process prior on the function $g$, $g(\y) \sim \GP(\m(\y), c(\y, \y')) $. The posterior $g_n$ given the observations $\{g(\y_i)\}_{i=1}^n$ is still Gaussian with tractable mean and covariance function, $g_n \sim \GP(\m_n(\y), c_n(\y, \y'))$. Since the integration operation is linear, the integration $\Pi[g_n]$ is another Gaussian process with mean and covariance.

\begin{align}
\begin{split}
    \mathbb{E}\left[\Pi\left[g_n\right]\right] =& \Pi[\boldsymbol{c}(\cdot, \bY)] \boldsymbol{C}^{-1} \bg \\
    \mathbb{V}\left[\Pi\left[g_n\right]\right]=& \Pi \Pi[c(\cdot, \cdot)] - \Pi[\boldsymbol{c}(\cdot, \bY)] \boldsymbol{C}^{-1} \Pi[\boldsymbol{c}(\bY, \cdot)]
\end{split}
\end{align}
where $\bg=[ g{(\y_1)}, g{(\y_2)}, \cdots, g{(\y_n)}]^\top$.
We can see that the mean function is a weighted average of the function values $\{g(\y_i)\}_{i=1}^n$ .
The weights can be written as:
\begin{align}
    \w = \boldsymbol{C}^{-1} \Pi[\boldsymbol{c}(\bY, \cdot)] 
\end{align}

\subsection{The Second Perspective: Minimizing the Estimate Error.}
Suppose the function $g$ belongs to a reproducing kernel Hilbert space $\calH_\calY$ with the reproducing kernel $k_\calY: \calY \times \calY \to \R$. Essentially, the integration can be viewed as a linear functional from $\calH_\calY $ to $\R$, so it can be written as an inner product in the space of $\calH_\calY$. 
\begin{align}
    \Pi[g] = \int f d\pi = \langle g, \mu(\pi) \rangle_\calH
\end{align}
where $\mu(\pi) \in \calH$ is the kernel mean embedding.
\begin{align}
    \mu(\pi)(\cdot) = \int k_\calY(\cdot, \y) d\pi
\end{align}

The Bayesian Monte Carlo estimator can be regarded as integration with respect to the empirical measure $\hat{\pi} = \sum_{i=1}^n w_i \delta_{\y_i}$, and so the corresponding kernel embedding is $\mu(\hat{\pi}) = \sum_{i=1}^n w_i k_\calY(\cdot, \y_i)$.

The estimate error can be written as
\begin{align}
\begin{split}
    |\int g d\pi - \int g d\hat{\pi}| &= \langle g, \mu(\pi) - \mu(\hat{\pi})\rangle_{\calH_\calY} \\
    &\leq ||g||_{\calH} ||\mu(\pi) - \mu(\hat{\pi})||_{\calH_\calY} \\
\end{split}
\end{align}
And $||\mu(\pi) - \mu(\hat{\pi})||_{\calH_\calY}$ can be further written as
\begin{align}
\begin{split}
    ||\mu(\pi) - \mu(\hat{\pi})||_{\calH_\calY}^2
    &= \sum_{i, j=1}^n w_i w_j k_\calY \left(\y_i, \y_j\right)-2 \sum_{i=1}^n w_i \int k_\calY\left(\y, \y_i\right) \mathrm{d} \pi(\y)+\iint k_\calY\left(\y, \y^{\prime}\right) \mathrm{d} \pi(\y) \mathrm{d} \pi\left(\y^{\prime}\right) \\ 
    &= \boldsymbol{w}^{\top} \boldsymbol{K} \boldsymbol{w}-2 \boldsymbol{w}^{\top} \Pi[k_\calY(\bY, \cdot)]+\Pi \Pi[k_\calY(\cdot, \cdot)]
\end{split}
\end{align}
where $\bK$ is $n \times n$ Gram matrix.

To minimize the estimate error, the weight $\w$ can be chosen to be
\begin{align}
    \w = \bK^{-1} \Pi[k_\calY(\bY, \cdot)]  
\end{align}

\subsection{The Third Perspective: Ridge Regression Solution.}

The integral is equal to the inner product of the integrand $g$ and kernel mean embedding $\mu(\pi)$. The estimate of the integral is determined by the estimate of the kernel mean embedding.

In kernel ridge regression, the objective can be formulated as 
\begin{align*}
    \argmin_{F \in \calH_\calY} \calE(F), \quad \calE(F) = \E_Y[|F(\bY) - \mu(\pi)(\bY)|^2]
\end{align*}
which can be approximated by empirical estimate as
\begin{align}
    \argmin_{F \in \calH_\calY} \widehat{\calE(F)}, \quad  \widehat{\calE(F)} = \sum_{i=1}^n |F(\y_i) - \mu(\pi)(\y_i)|^2
\end{align}
Based on the representer theorem, 
the optimal $F^\ast$ can be expressed as linear combinations of feature map $k_\calY(\y_i, \cdot)$.
\begin{align}
    F^\ast = \sum_{i=1}^{n} w_i k_\calY(\y_i, \cdot)
\end{align}
and the optimal weights $w_i$ satisfy the following linear equation:
\begin{align*}
    \sum_{i=1}^n w_i k_\calY(\y_i, \y_j) = \mu(\pi)(\y_j) \quad \forall j=\{1,2, \cdots, n\}
\end{align*}
The solution to the linear equation is $\w = \bK^{-1} \mu(\pi)(\bY)$, where $\mu(\pi)(\bY) = [\mu(\pi)(\y_1), \mu(\pi)(\y_2), \cdots, \mu(\pi)(\y_n)]^\top$ and is exactly $\Pi[k_\calY(\bY, \cdot)]$. Therefore, the integral can be estimated by the inner product between $f$ and $F^\ast$, and the weight vector $\w$ is
\begin{align}
    \w = \bK^{-1} \Pi[k_\calY(\bY, \cdot)]
\end{align}

From three different perspectives, we come at the exact same solution for optimal Bayesian Quadrature weights $w_i$. That's good!