\documentclass{article}

\usepackage{include/aistats2024_author_response}

\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
% \usepackage{hyperref}       % hyperlinks
% \usepackage{url}            % simple URL typesetting
% \usepackage{booktabs}       % professional-quality tables
% \usepackage{amsfonts}       % blackboard math symbols
% \usepackage{nicefrac}       % compact symbols for 1/2, etc.
% \usepackage{microtype}      % microtypography
% \usepackage{xcolor}         % define colors in text
% \usepackage{xspace}         % fix spacing around commands

\usepackage{microtype}
\usepackage{graphicx}
\usepackage{subfigure}
\usepackage{hyperref}       % hyperlinks
\usepackage{booktabs} % for professional tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage[font=small,labelfont=bf]{caption}

 \usepackage[T1]{fontenc}
\usepackage[nameinlink]{cleveref} % MUST be added last to work properly



\onecolumn

\makeatletter
\newcommand*{\textoverline}[1]{$\overline{\hbox{#1}}\m@th$}
\makeatother

\setlength{\parindent}{0pt}

\newcommand\blfootnote[1]{%
  \begingroup
  \renewcommand\thefootnote{}\footnote{#1}%
  \addtocounter{footnote}{-1}%
  \endgroup
}

\usepackage[dvipsnames]{xcolor}
\usepackage{tikz}
% internal tools
\newcounter{statementcounter}
\newcommand{\statementcount}{\stepcounter{statementcounter}\thestatementcounter}
\newcommand{\ballnumber}[1]{\tikz[baseline=(myanchor.base)] \node[circle,fill=Blue,inner sep=1pt] (myanchor) {\color{-.}\bfseries\footnotesize #1};}
% response commands
% \newcommand{\statement}[1]{\hspace{2mm}\ifnum0\thestatementcounter>8\ballnumber{\scriptsize{\statementcount}}\else\ballnumber{\statementcount}\fi~\textit{\color{NavyBlue}#1}}
\newcommand{\statement}[1]{\textit{\color{NavyBlue}#1}}
\newcommand{\qt}[1]{\textit{\color{NavyBlue}``#1''}}

\newcommand{\rone}{\textbf{Rev. \#1}}
\newcommand{\rtwo}{\textbf{Rev. \#2}}
\newcommand{\rthree}{\textbf{Rev. \#3}}
\newcommand{\rfour}{\textbf{Rev. \#4}}


\definecolor{darkgreen}{rgb}{0.0, 0.55, 0.55}

% \newcommand{\TR}[1]{\textcolor{red}{TR: #1}}
\newcommand{\HC}[1]{\textcolor{red}{HC: #1}}


\begin{document}

\textbf{\underline{All}}:$\,\,$ We thank the reviewers for their thoughtful and constructive feedback. 
This will help us clarify the many contributions of our paper, and all comments will be taken into account for the camera-ready version of the paper. 
\hrule
\vspace{-0.5mm}


\textbf{\underline{\rone}}:
\statement{The comparison with MOBQ is important enough to be in the main text.} We agree and will make the change.
\statement{Does the smoothness of the kernels impact convergence?}
Yes! Firstly, the convergence rate in $N$ depends explicitly on the smoothness of $k_{\mathcal{X}}$ (i.e. $s_\mathcal{X}$). 
Secondly, both the smoothness of $k_\mathcal{X}$ and $k_\Theta$ impact the convergence rate \emph{implicitly} in Assumption A4. If A4 is broken, convergence will be slower (Thm 1 \& 9 in [83]). We will clarify this point.

\hrule
\vspace{-1.0mm}

\textbf{\underline{\rtwo}}:
\statement{IS can be used when $f$ depends on $\theta$:}
We respectfully disagree. The ratio $p_\theta(x_i^t) / p_{\theta_t}(x_i^t)$ will account for integration against $\mathbb{Pb}_{\theta}$ instead of $\mathbb{Pb}_{\theta_t}$. However, when $f$ is parameter-dependent, we cannot re-use integrand evaluations as the integrand has shifted from $f(x_i^t, \theta_t)$ to $f(x_i^t, \theta)$, which are typically not in our dataset (see start of Section 2).

\vspace{-2.0mm}
\statement{A standard statistical estimation procedure always comes with uncertainty quantification.}
In principle, we do not dispute this statement. However, none of the competitors to CBQ provide exact finite-sample quantification of uncertainty. 
This is likely due to the challenging two-stage nature of the problem. 
In contrast, CBQ allows for seemless propagation of stage I uncertainty to stage II. We would happily revise our statement if provided with evidence to the contrary.

\vspace{-2.0mm}

\statement{Using GP priors [...] is the first solution that comes to mind, the novelty [...] is questionable.}
There are thousands of papers on computing $I(\theta)$ (see e.g. citations to [54]) and CBQ has never been proposed. Either way, we see being a `natural approach' very much as a strength rather than a weakness as it makes it more likely that CBQ is adopted widely. 

\vspace{-2.0mm}
\statement{Grounding an assessment of improvement over numerical experiments is insufficient as it is dependent on multiple calibration choices.} 
Firstly, our assessment is also theoretical: Thm 1 provides guarantees which simply do not exist for competitors.
Secondly, our empirical assessment is meticulous: Appendix C.1-C.4 provide extensive details and the code is on Github. As any ML algorithm, CBQ has hyperparameters that must be selected. We do this by maximising the marginal likelihood (see Appendix B.2), a standard approach for GPs. LSMC and KLSMC also have hyperparameters and we optimise these according to the recommendations in this literature. The empirical assessment is therefore fair. 

\vspace{-2.0mm}
\statement{[...] prior information is rarely available.}
This claim goes against the key founding principle of the entire field of probabilistic numerics (see e.g. [35,36,67]) and is not possible to respond to within the amount of space available.

% Practical understanding of $I(\theta)$ provides essential prior information. In our SIR experiment, small changes in the initial infection rate $\theta$ do not significantly alter $I(\theta)$, indicating its smoothness. However, since $I(\theta)$ denotes the peak number of infections, its first derivative may be non-smooth. This insight into $I(\theta)$'s smoothness is key to our kernel selection, as detailed in Appendix C.2.

\vspace{-2.0mm}
\statement{The incorporation of prior is not expanded further since the kernels are chosen for their regularity.}
Smoothness is one of the most common types of prior information encoded with GPs, and we focused on this so that the experiments and theory complement each other. However, it is true that other properties such as sparsity, periodicity or intrinsic low-dimensionality can also be encoded, and we will happily add such examples to the camera-ready version.

% Careful selection of hyperparameters (like smoothness parameter $\nu$ for Matern$-\nu$ kernel) based on prior information are particularly necessary to satisfy the assumptions in Thm 1, as detailed in Appendix C.1-C.4 for every kernel we use.


\vspace{-2.0mm}
\statement{The cost $O(TN^3+T^3)$ is severely detrimental.}
As usual for BQ methods, higher computational cost is often offset by faster convergence (see e.g. [15, 42, 44, 53, 66, 83, 84]). 
This is explicit throughout the paper; e.g. p2: ``our method is more
sample efficient than alternatives under mild smoothness
conditions [...] whenever the dimension
of $\mathcal{X}$ and $\Theta$ is not too large'', and also discussed  extensively in Section 4. The most convincing case is in Fig. 3 (middle), which shows that acquiring a \emph{single} sample from the SIR ODEs is more expensive than running the entire CBQ algorithm.


\vspace{-2.0mm}
\statement{Assumptions are brutal [...] densities being lower and upper bounded.}
This is common in this literature, see [15, 44].

\vspace{-2.0mm}
\statement{Quadrature methods are always advantaged when accounting for enough regularity in the integrand.}
This is exactly the intuition behind CBQ, and we are happy to see that you also agree with us!


\vspace{-2.0mm}
\statement{[...] GP can exploit the uncertainty assessment to increase the number of simulations [...]} 
We agree and this is discussed in the conclusion. CBQ provides significant gains in its current form, but this is an exciting avenue for future work. 
% , we mentioned that an adaptive selection to $N$, $T$, the location of $\theta_{1:T}$ and $x^t_{1:N}$ requires elaborate design of using uncertainty from both two stages and is left for future work. 

\vspace{-1.5mm}
\statement{$\mathbb{Q}$ appears abruptly in Thm 1 [...] putting a prior on $\theta$ [...].} $\mathbb{Q}$ is first mentioned on page 1. It is not a prior on $\theta$, but an arbitrary distribution from which $\{\theta_t\}_{t=1}^T$ are assumed to be drawn from, and makes our theory more widely applicable.
\hrule
\vspace{-1.0mm}


\textbf{\underline{\rthree}}:
\statement{How to optimally scale $(N, T)$ with fixed budget $B=NT$.}
The slowest term in our upper bound is $O(T^{-1/4})$ and the total cost is $O(TN^3+T^3)$, therefore we need to take $T \rightarrow \infty$ much faster than $N$. We also note that taking $N=O(T^{2/3})$ keeps the overall cost of the same order, so this gives us the optimal budget allocation. We will clarify this point.
% In order to balance the two terms in our Thm 1 with fixed $B=NT$, we typically need $T = N^{-3 + 8s_\mathcal{X} / d}$. Given $s_\mathcal{X} > d/2$, we always need more $T$ than $N$, which is intuitive because we need to have infinite $T$ to ensure the error to go to $0$.
\statement{Other presentational comments.} These are all really appreciated and will be implemented as suggested!
\vspace{0.5mm}

\hrule
\vspace{-1.5mm}
\textbf{\underline{\rfour}}: \statement{Why the regularization constant for $\theta$ is $O(\sqrt{T})$, but the regularization constant for $x_i$ is $0$.}
Thm 1 shows that we need $T \rightarrow \infty$ much faster than $N \rightarrow \infty$ due to a faster rate in $N$ than $T$. We can therefore take $N$ small and regularisation is not essential in practice. A more general form of the convergence result (which allows for regularisation in both $N$ \& $T$) can also be found in Thm 3 in Appendix A.
\vspace{-5pt}
\begin{minipage}[h]{0.74\linewidth}
\vspace{-5pt}
\statement{Comparison with ``BQ in stage-I and other regression methods in stage-II''.} Thanks, this is a great idea. This figure presents the results corresponding to Fig. 2 (middle) in the main text. It demonstrates that the improved performance comes from both stage I and stage II. We agree that this comparison should help readers and will therefore provide a comprehensive assessment on all examples in the main text in the camera ready version.
\end{minipage}
\hspace{0pt}
\begin{minipage}[h]{0.25\linewidth}
% \vspace{5pt}
\includegraphics[width=\linewidth]{aistats_rebuttal_figure.pdf}
\end{minipage}

% \tiny


\end{document}
