

\section{Introduction}\label{sec:introduction}

In the first year of my PhD, I have been taking some graduate courses and also been focusing on the project of Conditional Bayesian Quadrature (CBQ). Bayesian quadrature falls in the broad area of probabilistic numerics~\cite{hennig2015probabilistic} that aims at solving numerical problems with the tools from probabilistic machine learning like Gaussian process, Bayesian neural networks, etc. 
Probabilistic numerical methods can provide uncertainty quantification when numerically solving an integral problem or solving a differential equation.
Additionally, under some mild assumptions on the smoothness and the differentiability of the function, probabilistic numerical methods demonstrate faster rate of convergence to the truth~\cite{fx_quadrature}.

Additionally, I have been attending the Cambridge machine learning summer school and learned about recent advances in probabilistic machine learning. 
I have been attending many seminars at the Gatsby unit for understanding more of machine learning theory and also attending seminars at the weekly CSML semiar. 
The weekly Wednesday seminar organized by the foundational AI CDT is also a great opportunity for me to connect with my peers from all cohorts. 

In terms of career development, at some stage of my PhD, I expect to work as an intern in an industry as I would like to find out if industry career is suitable for me. I will also spend sometime overseas to work with other research groups in another academic institution.  
I will also attend the foundational AI CDT entrepreneurship trainings later in my PhD.

The following sections are from a finished manuscript on conditional Bayesian quadrature, being submitted to Neurips 2023 for review. There are many potential directions based on conditional Bayesian quadrature, the most straightforward of which is to use uncertainty information for active learning to increase sample efficiency.

\section{Introduction to CBQ}\label{sec:introduction_to_cbq}
This paper considers the computational challenge of estimating certain intractable expectations which arise in machine learning, statistics, and beyond. Given a function $f:\calX \times \Theta \rightarrow \R$, we are interested in estimating certain \emph{conditional expectations} (sometimes also called parametric expectations) $I: \Theta \rightarrow \R$ uniformly over the parameter space $\Theta$, where:
\begin{align*}\label{eq:conditional_expectation}
    I(\theta) = \E_{X \sim \mathbb{P}_\theta}[f(X,\theta)]=\int_\calX  f(x, \theta) \mathbb{P}_\theta(\mathrm{d} x), 
\end{align*}
and $\{\mathbb{P}_\theta\}_{\theta \in \Theta}$ is a family of distributions on the integration domain $\calX$. We will assume that $I(\theta)$ is sufficiently smooth in $\theta$, but that it is not available in closed form and must be approximated through samples and function evaluations. 

Conditional expectations arise when calculating tail probabilities in rare-event simulation \citep{Tang2013}, computing moment generating, characteristic, or cumulative distribution functions \citep{Giles2015,Krumscheid2018}. It also arises when computing the conditional value at risk or various valuations of options \citep{longstaff2001valuing,alfonsi2022many}, for Bayesian sensitivity analysis \citep{Lopes2011,Kallioinen2021}, or even more broadly for scientific sensitivity analysis; see for example Sobol indices \citep{Sobol2001}. Parametric expectations $I(\theta)$ are also often computed as an intermediate quantity. 
For example, given $\phi:\R \rightarrow \R$ and some probability distribution $\mathbb{Q}$ on $\Theta$, we are often interested in \emph{nested expectation} given by $\mathbb{E}_{\theta \sim \mathbb{Q}}[\phi(I(\theta))]$ \citep{Hong2009,Rainforth2018}. These arise when computing the expected information gain in Bayesian experimental design \citep{Chaloner1995}, and for computing the expected value of partial perfect information in health economics~\citep{heath2017review}.

Methods for computing $I(\theta)$ generally select $T$ parameter values $\theta_1,\ldots,\theta_T \in \Theta$, then simulate $N$ realisations from each corresponding probability distribution $\mathbb{P}_{\theta_1}, \ldots, \mathbb{P}_{\theta_T}$ at which they evaluate the integrand $f$, leading to a total of $N T$ evaluations. 
Classical Monte Carlo can be used to estimate $I(\theta_1), \ldots, I(\theta_T)$, but in many applications we are also interested in estimating either $I(\theta^*)$ for a fixed $\theta^* \notin \{\theta_1,\ldots,\theta_T\}$, or  $I(\theta)$ uniformly over $\theta \in \Theta$. 
As a result, a second step to combine $I(\theta_t)$ is required for the estimate. 

The most straightforward approach to estimate conditional expectation is importance sampling \citep{Glynn1989,Madras1999,Tang2013,Demange-Chryst2022}, where $I(\theta)$ is estimated by weighting function evaluations to account for the fact that the samples were not obtained from $\mathbb{P}_\theta$. 
Unfortunately, this approach is only applicable when $f$ does not depend on $\theta$, and it is usually difficult to identify an appropriate importance distribution. 
Alternatively, least-squares Monte Carlo  \citep{longstaff2001valuing,alfonsi2022many} or regression-based kernel mean shrinkage estimators \citep{muandet2016kernelmeanshrinkage,chau2021deconditional} first estimate $I(\theta_1),\ldots, I(\theta_T)$ through Monte Carlo, then estimate $I(\theta)$ through either linear, polynomial or kernel ridge regression based on these $T$ Monte Carlo estimators. These methods are therefore dependent on the accuracy of the Monte Carlo estimators and of the regression method. 

In addition, there are two main limitations which all of these methods suffer from. Firstly, they are very sample-intensive; i.e. they require a large number of function evaluations to reach a given level of accuracy, which makes them infeasible if sampling or evaluating the integrand is expensive. Secondly, obtaining a good, finite-sample, quantification of uncertainty for $I(\theta)$ is often infeasible. This is a significant limitation for challenging integration problems, where we would ideally like to know how accurate our estimator is likely to be.

To tackle these limitations, we propose a novel algorithm called \emph{conditional Bayesian quadrature} (CBQ). The name comes from the fact that our approach extends the Bayesian quadrature algorithm~\citep{Diaconis1988,OHagan1991BayesHermiteQ,Rasmussen2003,fx_quadrature} to the computation of conditional expectations. As such, CBQ falls in the line of work on probabilistic numerical methods \citep{hennig2015probabilistic,Cockayne2017BPNM,Oates2019Modern,Hennig2022}.
Our algorithm is based on a hierarchical Bayesian model consisting of two-stages of Gaussian process regression, and leads to a univariate Gaussian posterior distribution on $I(\theta)$ whose mean and variance are parametrised by $\theta$. 

This approach allows us to mitigate the two main limitations of existing methods. Firstly, we show both theoretically and empirically that our method is more sample efficient than alternatives under mild smoothness conditions on $f$ and $I(\theta)$ whenever the dimension of $\calX$ and $\Theta$ is not too large. As a result, smaller $N$ and $T$ are needed to achieve a desired accuracy, and the method will therefore be preferable for expensive problems. Secondly, the fact that we have an entire posterior distribution on $I(\theta)$ allows us to provide finite-sample Bayesian quantification of uncertainty. 

The remainder of the paper is structured as follows: In \Cref{sec:background}, we 
review existing methods for computing conditional expectations and Bayesian quadrature. 
In \Cref{sec:cbq}, we formalize our algorithm \textit{conditional Bayesian quadrature}.  
In \Cref{sec:theory}, we prove the convergence rate of our method.
In \Cref{sec:experiments}, we provide empirical results and compare with other baseline methods.



