%\section{Appendix}
%In this appendix, we provide additional content that could not fit in the main paper due to space constraints. In Section \ref{app:related}, we provide additional related work for our paper and specifically discuss the algorithm \singla of \citet{singla2016noisy} in more depth. 

\newpage

\onecolumn
\title{Adaptive Threshold Sampling for Pure Exploration in Submodular Bandits (Supplementary Material)}
\maketitle
\appendix
\section{Additional Related Work}
\label{appdx:related_work}
Approximation algorithms for submodular maximization problems with exact value oracle have been extensively studied in the literature \cite{nemhauser1978analysis,badanidiyuru2014fast,mirzasoleiman2015lazier,balkanski2019exponential}. For MSMC, the standard greedy algorithm produces a solution set with the best possible $1-1/e$ approximation guarantee in $O(n^2)$ queries of $f$. \cite{badanidiyuru2014fast} proposed a faster greedy-like algorithm that gives an approximation guarantee of $1-1/e-O(\epsilon)$ while reducing the sample complexity to $O(\frac{n}{\epsilon}\log\frac{n}{\epsilon})$.

% \textcolor{red}{Make sure there are clear definitions of the problems in the contributions, then we don't have to repeat them here or anywhere else.}
Another variant is USM
%Another problem that is extensively studied is the USM problem, which is to find
%a subset of $U$ that maximizes $f$, where $f$ is non-negative and non-monotone 
\cite{buchbinder2015tight,feige2011maximizing,buchbinder2018deterministic}. Notably, \cite{buchbinder2015tight} introduced a deterministic algorithm that gives a $1/3$ guarantee in $O(n)$ queries to an oracle for $f$, and a randomized version of their algorithm yields the best possible $1/2$ guarantee in expectation in the same number of queries. %\cite{feige2011maximizing} showed that 1/2 is the best approximation guarantee possible for this problem under some complexity assumptions.

% \textcolor{red}{Make sure multilinear extension is defined somewhere, and refer to that section the first time it is mentioned (maybe in contributions)}
%In this work, we also address the Submodular Maximization with Matroid constraints (SMM) problem. 
The final variant of submodular maximization we consider is MSMM
%Various algorithms have been devised for SMM under the assumption of access to an exact value oracle 
\cite{balkanski2019optimal,friedrich2014maximizing,fisher1978analysis}. The greedy algorithm only yields an approximation ratio of $1/2$ in this setting \cite{fisher1978analysis}. But by extending the discrete submodular function to its continuous counterpart, known as the multilinear extension (see the definition in Section \ref{sec:prelim}), and by solving the problem in this regime, it is proved that an approximation ratio arbitrarily close to the best possible $1-1/e$ can be achieved \cite{badanidiyuru2014fast,calinescu2011maximizing}.

Our work is also related to the best-arm-identification in multi-armed bandit literature \cite{audibert2010best,kaufmann2016complexity,jun2016top}, where the objective is to estimate the best action by choosing arms and receiving stochastic rewards from the environment. The most widely considered setting is the PAC learning setting \cite{even2002pac,kalyanakrishnan2012pac,zhou2014optimal}.

Our paper studies the same noisy setting as \cite{singla2016noisy}. There are essentially two versions of \singla, one gives an approximation guarantee of about $1-1/e$ with high probability (like our algorithm \alg does), and the other gives the same approximation guarantee but is randomized. The benefit of the latter over the former is better sample complexity. The bounds given on the sample complexity of \singla and the ones given in this paper for \alg are instance-dependent and incomparable to one another. We discuss how our algorithm relates to \singla in more depth in Section \ref{app:related}, but we briefly list here the potential advantages of our algorithm \alg compared to \singla: (i) Our algorithm has an approximation guarantee of about $1-1/e$ with high probability as opposed to an approximation guarantee of about $1-1/e$ in expectation as in the randomized version of \singla; (ii) Our algorithm is not as sensitive to small differences in marginal gain between elements since it is not based on the standard greedy algorithm as \singla is; (iii) The algorithm of \singla has greater time complexity beyond just the sample complexity because it requires $O(n\log n)$ computations per each noisy query to $\Delta f$; (iv) Our algorithm makes less estimations of $\Delta f$ overall since it is based on a faster variant of the greedy algorithm (\threshold). We further compare the algorithms experimentally in Section \ref{sec:exp_results}.

% \begin{definition}{\textbf{(PAC learning in} \prob\textbf{.)} }
% Fix $\epsilon>0$ and $\delta\in(0,1)$, an algorithm for \prob is $(\epsilon,\delta)$-PAC with a $\beta$-approximation guarantee if the returned solution set $S$ satisfies that $f(S)\geq \beta f(OPT)-\kappa\epsilon$ with probability at least $1-\delta$.
% \end{definition}
% From the definition and the result in Theorem \ref{mainthm}, we can see that our proposed algorithm \alg is $(2\epsilon,\delta)$-PAC with an $(1-1/e-\alpha)$-approximation guarantee.
\subsection{Other noisy model}

If the noisy model is that the the samples are taken from distribution $\mathcal{D}(X)$ to evaluate $f(X)$ instead of the marginal gain, the model also satisfies our setting. This is because if the noisy evaluation of $f(X)$ is R-sub-Gaussian, the noisy evaluation of the marginal gain $\Delta f(X,u)$ can be obtained by taking two noisy samples of $f$ and calculating $\mathcal{D}(X\cup\{u\})-\mathcal{D}(X)$ and that the difference of two independent sub-Gaussian random variables is also sub-Gaussian.

\section{Comparison with \singla}
\label{app:related}
In this section, we provide more discussion about the related algorithm \singla of \citet{singla2016noisy}. \singla combines the standard greedy algorithm with the best arm identification algorithm used in combinatorial bandit literature \cite{chen2014combinatorial}.

In particular, the standard greedy algorithm for \prob \cite{nemhauser1978analysis} goes as follows: A solution $S$ is built by iteratively choosing the element $u\in U$ that maximizes the marginal gain $\Delta f(S,u)$ until the cardinality constraint $\kappa$ is exhausted. \singla follows a setting like ours, so instead of choosing the element of maximum marginal gain at each iteration, they follow the standard greedy algorithm but adaptive sampling following techniques from the best-arm identification problem is done in order to identify the element(s) with the highest marginal gain. The simplest version of their algorithm identifies one element with the highest marginal gain at each iteration, and this version has a guarantee of about $1-1/e$ with high probability as in \alg. This algorithm is \texttt{EXP-GREEDY} in Section \ref{sec:exp}. However, a downside of this approach is that many samples are often needed to distinguish between elements of nearly the same marginal gain. In contrast, notice that our algorithm \alg does not need to compare marginal gains between elements and therefore does not have this issue.

In order to deal with the sample inefficiency, \singla is generalized to a randomized version. The randomized version of \singla involves a subroutine called \texttt{TOPX}, which adaptively samples marginal gains until a subset of elements with relatively high marginal gains have been identified. Then a randomly selected element among the subset is added to the solution set. In particular, given an integer $0 <\kappa'\leq \kappa$, the TOPX algorithm runs TOP-$l$ selection algorithms for each $l\in\{1,2,...,\kappa'\}$, and each of the TOP-$l$ selection algorithm runs until it returns a subset of $l$ items with highest marginal gain with high probability. The TOPX algorithm stops once there exists some $l$ such that the TOP-$l$ selection algorithm ends. This randomized version of \singla has an almost $1-1/e$ approximation guarantee, but it holds in expectation and with high probability. The case where $\kappa'=\kappa$ is \texttt{EXP-GREEDY-K} in Section \ref{sec:exp}.

Now that we have described the two versions of \singla and their corresponding approximation guarantee, we look into more detail about the efficiency of \singla in terms of runtime and sample complexity.

It is proven by \cite{singla2016noisy} that the number of samples taken for each iteration where an element is added to the solution is at most
\begin{align*}
    O\left(n\kappa'R^2\min\left\{\frac{4}{\Delta_{\max}^2},\frac{1}{\epsilon^2}\right\}\log\left(\frac{R^2\kappa n\min\{\frac{4}{\Delta_{\max}^2},\frac{1}{\epsilon^2}\}}{\delta}\right)\right)
\end{align*}
where $\Delta_{\max}$ is the largest difference amongst the first $\kappa'$ element's marginal gains. In other words, this is the number of samples taken each time TOPX is called. Since an element being added involves approximating the marginal gains over all of the elements of $U$, the average sample complexity to compute an approximate marginal gain for a single element is then
\begin{align*}
    O\left(\kappa'R^2\min\left\{\frac{4}{\Delta_{\max}^2},\frac{1}{\epsilon^2}\right\}\log\left(\frac{R^2\kappa n\min\{\frac{4}{\Delta_{\max}^2},\frac{1}{\epsilon^2}\}}{\delta}\right)\right).
\end{align*}
We compare the above to a single call of \samp in our algorithm \alg, which is the analogous computation where we are approximating the marginal gain for an element of $U$. Recall from Theorem \ref{mainthm} that the bound for the sample complexity for \samp is the minimum between
    \begin{align*}
        \left\{\frac{2R^2}{\phi^2(S,u)}\log\left(\frac{4R^2\sqrt{\frac{3nh(\alpha)}{\delta}}}{\phi^2(S,u)}\right),\frac{R^2}{2\epsilon^2}\log \left(\frac{6nh(\alpha)}{\delta}\right)\right\}.
    \end{align*}
If $k'=1$, i.e. the non-randomized version of \singla that has a similar approximation guarantee to our algorithm \alg, then $\Delta_{\text{max}}$ is the difference between the top two marginal gains, which could be very small and therefore the sample complexity quite high. On the other hand, \samp is not sensitive to this property. In order to make $\Delta_{\max}$ bigger, one could increase $k'$ and use the randomized version of \singla. But this case could have worse sample complexity compared to ours as well. If $\Delta_{\max}$ is small and satisfies that $\Delta_{\max}=O(\epsilon)$, then the sample complexity of \singla is worse than our averaged sample complexity by a factor of at least $O(\kappa')$.

Further, since \singla follows the standard greedy algorithm, there are $\kappa$ calls made to TOPX. In contrast, \alg is based on the faster variant of the greedy algorithm, \threshold, and so only requires $O(\log(\kappa))$ iterations over $U$.

%\begin{itemize}
%    \item If $\Delta_{\max}$ is small and satisfies that $\Delta_{\max}=O(\epsilon)$, then the sample complexity of \singla is worse than our averaged sample complexity, which is at most $\frac{R^2\log \frac{6nh(\alpha)}{\delta}}{2\epsilon^2}$, by a factor of $O(\kappa')$. This could especially occur if we set $\kappa'$ to be $1$, i.e. the non-randomized version of \singla that has a similar approximation guarantee to our algorithm \alg, and then $\Delta_{\text{max}}$ is the difference between the top two marginal gains. Therefore in order to offset this issue in \singla, we would have to make $\kappa'$ relatively large and we would only have an approximation guarantee in expectation.
    %The worst case average sample complexity of \singla for each element in each iteration
    %\begin{align*}
    %&O(\kappa' R^2\min\{\frac{4}{\Delta_{\max}^2},\frac{1}{\epsilon^2}\}\log\big(\frac{R^2\kappa n\min\{\frac{4}{\Delta_{\max}^2},\frac{1}{\epsilon^2}\}}{\delta}\big))\\
    %&= O(\kappa'R^2\frac{1}{\epsilon^2}\log\big(\frac{R^2\kappa n}{\delta\epsilon^2}\big)).
    %\end{align*}
   %if $\Delta_{\max}$ is small and satisfies that $\Delta_{\max}=O(\epsilon)$,
    %This result is worse than our averaged sample complexity, which is at most $\frac{R^2\log \frac{6nh(\alpha)}{\delta}}{2\epsilon^2}$, by a factor of $O(\kappa')$. 
    %On the other hand, 
    %\item When $\kappa'>1$, the approximation ratio of $f$ for \singla is in expectation while ours is on exact value. 
    % \item Since the algorithm in \cite{singla2016noisy} follows the standard greedy algorithm, the number of iterations required by the main algorithm in \cite{singla2016noisy} is $O(\kappa)$ while our result requires only $O(\log(\kappa))$ number of iterations.
 %   \item  Since the \singla algorithm follows the standard greedy algorithm, the required iteration count stands at $O(\kappa)$ while our result requires only $O(\log(\kappa))$ number of iterations.
%\end{itemize}

%A final factor that makes \alg advantageous relative to \singla is the better overall time complexity. This is because when the algorithm TOPX is computing the estimated marginal gains of all of the items in $U$, it has to maintain a sorted list of $U$ along both an upper bound for the estimated marginal gain, as well as a lower bound for the estimated marginal gain. In particular, each time TOPX makes a noisy query to $\Delta f$, the TOPX algorithm updates the confidence interval for all the elements,
%which requires $O(n)$ computations,
%and then TOPX updates the sorted lists on the two different estimates of marginal gains. However, both \alg and \texttt{EPS-AP} are more sample efficient and require only one update of the confidence interval in line \ref{alg: update confidence interval} and two comparisons in line \ref{line: comparison to thres 1} and \ref{line: comparison to thres 2} in \samp, which is $O(1)$ in computation. 

%\subsection{Time Complexity}
Another factor that makes \alg preferable to \singla is its run time besides sample complexity.  From the description of \singla in \cite{singla2016noisy}, we can see that at each time a noisy query to $\Delta f$ is taken, the TOP-$l$ selection algorithm updates the confidence interval for all the elements, and then the algorithm sorts all elements to find the set $M_t$ of $l$ elements with highest empirical marginal gain. Then another estimate of the marginal gains is computed to be the empirical mean plus a confidence interval or minus the confidence interval depending on whether the elements are within $M_t$. Next, the algorithm sorts the newly obtained estimates to find the top-$l$ set with respect to the new estimates. However, both \alg and \texttt{EPS-AP} have more efficient runtime complexity and require only one update of the confidence interval in Line \ref{alg: update confidence interval} and two comparisons in Line \ref{line: comparison to thres 1} and \ref{line: comparison to thres 2} in \samp, which is only $O(1)$ in computation.  


\section{Appendix for Section \ref{sec:sampling}}

In this section, we present the omitted content of Section \ref{sec:sampling}. In Section \ref{appdx:compare_to_fixed_eps_approx}, we present a comparison of our result with the fixed $\epsilon$-approximation. In Section \ref{appdx:proof_of_samp}, we present the proof of Theorem \ref{thm:sampling}. In Section \ref{appdx:proof_of_samp2}, we present the proof of Theorem \ref{thm:sampling2}.


\subsection{Comparison of \samp to fixed $\epsilon$-approximation}
\label{appdx:compare_to_fixed_eps_approx}
{In this section, we present a comparison of our result with the fixed $\epsilon$-approximation. A fixed $\epsilon$-approximation is essentially when one applies a concentration inequality such as Hoeffding's or the Chernoff Bound for a fixed number of noisy samples such that the empirical mean of the evaluated random variable $X$, which is denoted as $\hat{X}$, satisfies that $|\hat{X}-\mathbb{E}[X]|\leq\epsilon$. (see also discussion in Section \ref{sec:prelim}).}

{The fundamental reason this approach is less efficient compared to \samp is that we are only interested in determining whether $f(X)$ is approximately above a threshold or not, not in obtaining a precise approximation. In other words, we don't need the guarantee that the $|\hat{X}-\mE X|\leq\epsilon$ in Hoeffding's inequality; instead, we care about whether $\mE X\geq w$. Ideally, we would approximate $f(X)$ just finely enough to determine if it's above the threshold or not. However, this isn't feasible with the fixed $\epsilon$-approximation, because we don't have any prior knowledge of how far $f(X)$ is from the threshold. Consequently, we can't determine the required number of samples, and the fixed $\epsilon$-approximation approach requires that there be a single batch of i.i.d. samples, which limits flexibility.}

{In contrast, \samp uses an adaptive sampling approach where samples are iteratively taken one-by-one until an evolving confidence interval crosses a threshold. The goal of \samp is to use fewer samples compared to a fixed $\epsilon$-approximation. While \samp might initially seem similar to fixed $\epsilon$-approximation, there are several critical differences that introduce unique technical challenges in its development and analysis:
\begin{itemize}
\item Fixed $\epsilon$-approximation approaches have a batch of samples in which a single application of a concentration inequality is applied in order to approximate $\mE X $. In contrast, in \samp, we apply a concentration inequality after every single sample, and then take a union bound over all the applications. However, this is challenging because we don't know how many samples we will end up taking to approximate the mean value sufficiently well since that depends on the result of the sampling. So we have to carefully design our confidence intervals.
    \item Fixed $\epsilon$-approximation approach takes a predetermined number of samples, independent of the sampling results. In contrast, the CS algorithm dynamically determines the number of samples based on the outcomes of previous samples. Additionally, CS reuses samples across multiple applications of concentration bounds, enhancing its efficiency.
    \item In \samp, the size of the confidence interval evolves with each additional sample, shrinking as the number of samples increases (see Theorem 1). Additionally, when applying concentration inequalities, the failure probability is adjusted dynamically based on how many samples we've taken so far (see proof of Lemma 6). The benefit of the varying failure probability is that the obtained sample complexity $\frac{8R^2}{\phi_X^2}\log\left(\frac{16R^2}{\phi_X^2}\sqrt{\frac{2}{\delta}}\right)$ won't suffer from small values of $\epsilon$.
    \item In Theorem 2 and 4, we use a combination of Hoeffding and Chernoff that is well-suited to the threshold algorithms, rather than using one or the other. This approach improves the sample complexity from $O(R^2)$ in Theorem 1 to $O(R)$ when $R$ is large.
\end{itemize}
}

{\samp is in fact related to adaptive approaches used in the Upper Confidence Bound (UCB) algorithm in multi-armed bandit, and is distinct from most existing approaches in submodular optimization, with the notable exception of \cite{singla2016noisy}, which integrates a best-arm identification algorithm into the standard greedy framework. 
}




\subsection{Additional lemmas and analysis of Theorem \ref{thm:sampling}}


\label{appdx:proof_of_samp}

In this section, we present the proof of Theorem \ref{thm:sampling2}, which provides the theoretical results of sample complexity and approximation guarantee of the \samp algorithm. First of all, we provide the statement of Theorem \ref{thm:sampling} again.

\noindent\textbf{Theorem \ref{thm:sampling}. }\textit{
   For any random variable $X$ that is $R$-sub-Gaussian, if we define $N_1=2R^2/\epsilon^2\log \frac{4}{\delta}$, and $\conf =R\sqrt{\frac{2}{t}\log \frac{8 t^2}{\delta}}$, then the algorithm \samplong achieves that with probability at least $1-\delta$
    \begin{enumerate}
        \item \samp on input $(w,\epsilon,\delta,\mathcal{D}_X,R)$ takes at most the minimum between
    \begin{align*}
    % \label{eq:sam_complxt}
        \left\{\frac{2R^2}{\epsilon^2}\log \left(\frac{4}{\delta}\right),\frac{8R^2}{\phi_X^2}\log\left(\frac{16R^2}{\phi_X^2}\sqrt{\frac{2}{\delta}}\right)\right\}
    \end{align*}
    noisy samples, where $R$ is as defined in Section \ref{sec:prelim}, $\phi_X = \frac{\epsilon + |w-\mathbb{E} X|}{2}$.
    \item If \samp returns true, then $\mE X\geq w-\epsilon$. If \samp returns false, then $\mE X\leq w+\epsilon$.
    \end{enumerate}
   % Besides, it holds with probability $1$ that 
    % where $\epsilon=3\epsilon$.    
    }




Before we present the detailed proof, here we provide an overview of the proof. In order for \samp to correctly determine whether $\mE X$ is approximately above or below the threshold $w$, i.e. the second result of Theorem \ref{thm:sampling}, two random events must occur during \samp. The first event is that at all iterations during the for loop, the confidence regions around the sample mean ($\hat{X}_t$) contain the true expected value ($\mE X$). The second event is that after $N_1$ samples taken by the for loop on Line \ref{line:sample_N_1}, we have achieved an $\epsilon$-additive approximation of the expected value. Basically these two events together mean that \samp is correct about the region where $\mE X$ is throughout the algorithm, and therefore it returns the correct answer to whether $\mE X$ is approximately above or below the threshold $w$. The following Lemma states that on a run of \samp, the two events hold with probability at least $1-\delta$.

\begin{lemma}
\label{lem:clean_event}
    With probability at least $1-\delta$, the following two events hold.
    \begin{enumerate}       
    \item At any time $t\in\mathbb{N}_+$, the sample mean $\widehat{X}_t$ satisfies that
    $|\widehat{X}_t-\mE X|\leq \conf$,
    where $\conf:=R\sqrt{\frac{2}{t}\log \frac{8 t^2}{\delta}}$.
    \item The sample mean $\widehat{X}_{N_1}$ at time $N_1:=\frac{2R^2}{\epsilon^2}\log \frac{4}{\delta}$ satisfies that $|\widehat{X}_{N_1}-\mE X|\leq \epsilon  $.
    \end{enumerate}
\end{lemma}
\begin{proof}
    First, we apply the Hoeffding's inequality on $\widehat{X}_{N_1}$ and it follows that 
    \begin{align*}
        P\left(|\widehat{X}_{N_1}-\mE X|\geq \epsilon \right)\leq 2\exp\left(-\frac{N_1\epsilon^2}{2R^2}\right)\leq\frac{\delta}{2}.
    \end{align*}
    Next, by applying the Hoeffding's inequality for any fixed time $t$, we have that
    \begin{align*}
        P\left(|\widehat{X}_t-\mE X|\geq \conf \right)\leq \frac{\delta}{4t^2}.
    \end{align*}
By taking the union bound for any time $t$, it follows that
\begin{align*}
    &P(\exists t \text{ s.t. }|\widehat{X}_t-\mE X|\geq \conf )\\
    &\leq\sum_{t=1}^\infty P(|\widehat{X}_t-\mE X|\geq \conf )\\
    &\leq \frac{\delta}{4}\sum_{t=1}^\infty
    \frac{1}{t^2}\leq\frac{\delta}{2}.
\end{align*}
By taking the union bound again on the two events above, we have that
\begin{align*}
    &P(|\widehat{X}_{N_1}-\mE X|\geq \epsilon\text{ or }\exists t \text{ s.t. }|\widehat{X}_t-\mE X|\geq \conf )\\
    &\leq P\left(|\widehat{X}_{N_1}-\mE X|\geq \epsilon \right)+P(\exists t \text{ s.t. }|\widehat{X}_t-\mE X|\geq \conf )\\
    &\leq \delta.
\end{align*}
\end{proof}



The second lemma required for establishing Theorem \ref{thm:sampling} concerns the number of samples that \samp takes before its approximation of $\mE X$ is sufficiently accurate so that it can terminate. The number of samples depends on how far away the true value of $f$ is from the threshold. In particular, Lemma \ref{lem:conf_int} below states that once the confidence interval goes beneath the corresponding $\phi$ value (as defined in Theorem \ref{thm:sampling}), then \samp will complete.
%But even in the worst case where the distance $\phi$ is small, \samp will get an $\epsilon$-approximation to $\mE X$ after $N_1$ samples.
Lemma \ref{lem:conf_int} and its proof are stated below.
% \textcolor{red}{TODO: Probably need to change the way this is stated, the probabilities with the two lemmas feel a little redundant, maybe there is a better way to state this second one. Maybe "Once $C_t$ satisfies blah blah, then \samp will terminate, and this occurs in at most blah blah many samples".}
% first prove Lemma \ref{lem:clean_event}, which essentially states that \samp correctly identifies elements to be added or not added to the solution throughout \alg. Next, we
\begin{lemma}
\label{lem:conf_int}
    With probability at least $1-\delta$, when the confidence interval $\conf$ satisfies that
    \begin{align*}
        \conf \leq \phi_X,
    \end{align*}
    the sampling of $ X$ finishes, where $\phi_X = \frac{\epsilon + |w-\mE X|}{2}$.
    
\end{lemma}
\begin{proof}
    % First, we prove that when $\conf\leq \epsilon$ the Algorithm \ref{alg:TAMG} ends. Notice that when $\conf\leq \epsilon$, it holds that $w-\epsilon+\conf\leq w+\epsilon-\conf$. Then one of the following statements must be true.
    % $$\deltafe\geq w-\epsilon+\conf$$ and $$\deltafe\leq w+\epsilon-\conf.$$
    % Therefore, the algorithm ends. Second, we consider the case where $\conf\leq\frac{\epsilon+w-\Delta f(S,s)}{2}$. In this case, we have that 
    % \begin{align*}
    %     \Delta f(S,s)+2\conf\leq w+\epsilon
    % \end{align*}
    % Notice that conditioned on the clean event defined in Lemma \ref{lem:clean_event_all_time}, we have that $\deltafe\leq\Delta f(S,s)+\conf$. Then
    % \begin{align*}
    %     \deltafe+\conf\leq w+\epsilon.
    % \end{align*}
    % Thus the algorithm ends.
     % If $t>N_1$, then from Alg \ref{alg:TAMG}, the algorithm ends. If $t\leq N_1$, 
     If $\conf\leq\frac{\epsilon+w-\mE X}{2}$, then we have $\mE X\leq w+\epsilon-2\conf $. From Lemma \ref{lem:clean_event}, we have that with probability at least $1-\delta$, it holds that $\widehat{X}_t-\mE X\leq\conf$. Therefore,
     \begin{align*}
         &\widehat{X}_t+\conf\\
         &\leq (\widehat{X}_t-\mE X)+\mE X+\conf\\
         &\leq w+\epsilon.
     \end{align*}
     Thus the algorithm ends.
     
     % Since $t\leq N_1$, we have that $\conf\geq \epsilon$, therefore it holds that 
     Similarly, we consider the case where $\conf\leq\frac{\epsilon-w+\mE X}{2}$. In this case, we have that $\mE X\geq 2\conf+w-\epsilon$.
    Notice that conditioned on the clean event defined in Lemma \ref{lem:clean_event}, we have that $\widehat{X}_t-\mE X\geq-\conf$. Then
    \begin{align*}
        \widehat{X}_t-\conf&\geq \widehat{X}_t-\mE X\\
        &\qquad+\mE X-\conf\\
        &\geq-\conf+2\conf\\
        &\qquad+w-\epsilon-\conf\\
        &=w-\epsilon.
    \end{align*}
    Therefore, the algorithm ends. 
\end{proof}

Now we present the proof of Theorem \ref{thm:sampling}. 
\begin{proof}
    We first prove the result on sample complexity, which is the first result in Theorem \ref{thm:sampling}. From Lemma \ref{lem:conf_int}, we have if
    \begin{align}
    \label{ineq:confidence_intv}
     \conf \leq \phi_X,
    \end{align}
    then the Algorithm \ref{alg:samp} finishes.
    Since $\conf =R\sqrt{\frac{2}{t}\log \frac{8 t^2}{\delta}}$, we have the above inequality (\ref{ineq:confidence_intv}) is equivalent to that
    \begin{align*}
     \frac{4\log (\sqrt{\frac{8}{\delta}}t)}{t}\leq \frac{\phi^2_X}{R^2}.
    \end{align*}
    Since $\sqrt{\frac{8}{\delta}}t\geq 2$, from Lemma \ref{lem:logx_over_x}, we have when
    \begin{align*}
        t\geq\frac{8R^2}{\phi^2_X}\log(\frac{16R^2}{\phi^2_X}\sqrt{\frac{2}{\delta}}),
    \end{align*}
    the above inequality holds and the Algorithm \ref{alg:samp} ends. Therefore, the number of samples required is bounded by $\min\{\frac{8R^2}{\phi^2_X}(\log\frac{16R^2}{\phi^2_X}\sqrt{\frac{2}{\delta}}),N_1\}$. 
    % We conclude the proof by applying the above results on the number of samples for each $s$ and $S_{i,s}$.

    Next, we prove the second result in Theorem \ref{thm:sampling}. If $t=N_1$ when \samp ends, then conditioned on the events in Lemma \ref{lem:clean_event}, $|\widehat{X}_{N_1}-\mE X|\leq\epsilon$. Thus 
  if the algorithm returns true, $\mE X\geq \widehat{X}_t - \epsilon\geq w-\epsilon$. If the output of the algorithm is false, then $\widehat{X}_t \leq w$. Similarly we have that $\mE X\leq \widehat{X}_t + \epsilon\leq w+\epsilon$. Secondly, let us consider the case where $t<N_1$ when the algorithm \samp ends. Conditioned on the second event in Lemma \ref{lem:clean_event}, we have if the algorithm \samp returns true, $\mE X\geq\widehat{X}_t-\conf\geq w-\epsilon$. If the output is false, $\mE X\leq\widehat{X}_t+\conf\leq w+\epsilon$.
\end{proof}







\subsection{Proof and analysis of Theorem \ref{thm:sampling2}}
\label{appdx:proof_of_samp2}
In this section, we present the omitted proofs of Theorem \ref{thm:sampling2} in Section \ref{sec:sampling}. Theorem \ref{thm:sampling2} provides another result of the approximation error for the \samp algorithm by defining the confidence interval $C_t$ to be $C_t=\frac{3R}{t\alpha}\log\big(\frac{8t^2}{\delta}\big)$ and the worst-case sample complexity $N_1$ to be $N_1=\frac{3R}{\epsilon\alpha}\log \left(\frac{4}{\delta}\right)$. We begin by stating Theorem \ref{thm:sampling2}, followed by the proof of the theorem. Finally, we establish the lemmas crucial to the proof of the theorem.



\noindent\textbf{Theorem \ref{thm:sampling2}. }\textit{
   For any random variable $X$ that is bounded in the range of $[0,R]$, if we define $C_t=\frac{3R}{t\alpha}\log(\frac{8t^2}{\delta})$, and $N_1=\frac{3R}{\epsilon\alpha}\log \left(\frac{4}{\delta}\right)$ where $\alpha$ is an additional parameter that controls the multiplicative error rate, the algorithm \samplong achieves that with probability at least $1-\delta$, the algorithm \sampnewlong achieves that with probability at least $1-\delta$
    \begin{enumerate}
        \item \sampnew on input $(w,\epsilon,\delta,\mathcal{D}_X,R)$ takes at most the minimum between
    \begin{align*}
    % \label{eq:sam_complxt}
        \left\{\frac{3R}{\epsilon\alpha}\log \left(\frac{4}{\delta}\right),\frac{12R}{\alpha\phi_X'}\log\left(\frac{12R}{\alpha\phi_X'}\sqrt{\frac{8}{\delta}}\right)\right\}
    \end{align*}
    noisy samples, $\phi_X' = \frac{\epsilon -\alpha\mE X+| w-\mathbb{E} X|}{2}$.
    \item If the output is true, then $(1+\alpha)\mE X\geq w-\epsilon$. If the output is false, then $(1-\alpha)\mE X\leq w+\epsilon$.
    \end{enumerate}
}
\begin{proof}
   First of all, we prove the result on the sample complexity as presented in the first result in Theorem \ref{thm:sampling2}. From Lemma \ref{lem:conf_int2}, we have if $$\conf\leq\phi_X',$$ the algorithm ends. By definition of $\conf$, we have that the above result is equivalent to that
   \begin{align*}
       \frac{3R}{t\alpha}\log(\frac{8t^2}{\delta})\leq\phi_X'.
   \end{align*}
   From Lemma \ref{lem:logx_over_x}, we have that when
   \begin{align*}
       t\geq\frac{12R}{\alpha\phi_X'}\log\big(\frac{12R}{\alpha\phi_X'}\sqrt{\frac{8}{\delta}}\big)
   \end{align*}
   the above inequality holds and thus the algorithm ends. From the description of the algorithm, we have that the number of samples is also bounded by $N_1$. Therefore, the first result in Theorem \ref{thm:sampling2} is proved.

   Next, we prove the second result on the difference of $\mE X$ and $w$. If $t=N_1$ when \samp ends, then if the algorithm returns true, we have that with probability at least $1-\delta$,
   \begin{align*}
       (1+\alpha)\mathbb{E}X+\epsilon\geq \widehat{X}_{N_1}\geq w.
   \end{align*}
   where the first inequality follows from Lemma \ref{lem:clean_event_samp2}. If the algorithm returns false and $t=N_1$ when the algorithm ends, then with probability at least $1-\delta$, 
   \begin{align*}
       (1-\alpha)\mathbb{E}X-\epsilon\leq \widehat{X}_{N_1}\leq w.
   \end{align*} 
  Next, we consider the case where $t<N_1$ when the algorithm ends. Conditioned on the first event in Lemma \ref{lem:clean_event_samp2} and from the stopping condition of \samp, we can see if \samp returns true, then
  \begin{align*}
       (1+\alpha)\mathbb{E}X+\epsilon\geq \widehat{X}_t-\conf+\epsilon\geq w.
   \end{align*}
   If \sampnew returns false, then
  \begin{align*}
       (1-\alpha)\mathbb{E}X-\epsilon\leq \widehat{X}_t+\conf-\epsilon\leq w.
   \end{align*} 
\end{proof}

We now present the statement and the proofs of the lemmas used in the proof of Theorem \ref{thm:sampling2}. We start by introducing Lemma \ref{lem:clean_event_samp2}, which defines 
two "clean events".
\begin{lemma}
    \label{lem:clean_event_samp2}
    With probability at least $1-\delta$, the following two events hold.
    \begin{enumerate}       
    \item At any time $t\in\mathbb{N}_+$, the sample average $\widehat{X}_t$ satisfies that
    $|\widehat{X}_t-\mE X|\leq \alpha\mathbb{E}X+\conf$,
    where $\conf:=\frac{3R}{t\alpha}\log(\frac{8t^2}{\delta})$.
    \item The sample average $\widehat{X}_{N_1}$ at time $N_1:=\frac{3R}{\epsilon\alpha}\log \left(\frac{4}{\delta}\right)$ satisfies that $|\widehat{X}_{N_1}-\mE X|\leq \alpha\mE X+\epsilon$.
    \end{enumerate}
\end{lemma}
\begin{proof}
    By applying the Lemma \ref{lem:chernoff}, we have that for any fixed time step $t$, 
    \begin{align*}
        P\big(|\widehat{X}_t-\mE X|> \alpha\mathbb{E}X+\conf\big)&\leq2\exp\{-\frac{t\alpha\conf}{3R}\}\\
        &\leq\frac{\delta}{4t^2}.
    \end{align*}
    By taking the union bound over all time step $t\in\mathbb{N}_+$, we have
    \begin{align*}
        &P\big(|\widehat{X}_t-\mE X|> \alpha\mathbb{E}X+\conf,\forall t\big)\\
        \leq&\sum_{t=1}^{\infty}P\big(|\widehat{X}_t-\mE X|> \alpha\mathbb{E}X+\conf\big)\\
    \leq&\sum_{t=1}^{\infty}\frac{\delta}{4t^2}\leq\frac{\delta}{2}.
    \end{align*}
    Therefore the first event in the lemma holds with probability at least $1-\delta/2$. By applying the Lemma \ref{lem:chernoff} again, we have that for $t=N_1$, 
    \begin{align*}
        P\big(|\widehat{X}_{N_1}-\mE X|> \alpha\mathbb{E}X+\epsilon\big)\leq2\exp\{-\frac{N_1\alpha\epsilon}{3R}\}=\delta/2.
    \end{align*}
     It follows that the second event in the lemma holds with probability at least $1-\delta/2$. By combining the two results and applying the union bound again, we know that with probability at least $1-\delta$, the two events both hold.
\end{proof}
Next, we prove another lemma that is used in the proof of the sample complexity result in Theorem \ref{thm:sampling2}.

\begin{lemma}
\label{lem:conf_int2}
    With probability at least $1-\delta$, when the confidence interval $\conf$ satisfies that
    \begin{align*}
        \conf \leq \phi'_X,
    \end{align*}
    the sampling of $ X$ finishes, where $\phi_X' = \frac{\epsilon-\alpha\mE X + |w-\mE X|}{2}$.
\end{lemma}
\begin{proof}
    To prove the lemma, it is equivalent to prove that when $C_t\leq\frac{\epsilon-\alpha\mE X + w-\mE X}{2}$ or $C_t\leq\frac{\epsilon-\alpha\mE X - w+\mE X}{2}$, the algorithm ends. First of all, if $C_t\leq\frac{\epsilon-\alpha\mE X + w-\mE X}{2}$, then $(1+\alpha)\mE X+2\conf\leq w+\epsilon$. Conditioned on the events in Lemma \ref{lem:clean_event_samp2}, we have that with probability at least $1-\delta$, it follows that
    \begin{align*}
        \widehat{X}_t+\conf\leq(1+\alpha)\mE X+2\conf\leq w+\epsilon.
    \end{align*}
    Thus the sampling of $X$ ends. Next, if $C_t\leq\frac{\epsilon-\alpha\mE X - w+\mE X}{2}$, then $(1-\alpha)\mE X-2\conf\geq w-\epsilon$. By Lemma \ref{lem:clean_event_samp2},
    \begin{align*}
        \widehat{X}_t-\conf\geq(1-\alpha)\mE X-2\conf\geq w-\epsilon.
    \end{align*}
    Then the algorithm ends.
\end{proof}
\section{Appendix for Section \ref{sec:monotone}}
In this section, we present the omitted content in Section \ref{sec:monotone}, which is organized as follows: In Section \ref{appdx:compare_to_sample_before}, we discuss and compare the theoretical performance of our algorithm, \algmono, with the sampling-before-hand algorithm in the context of the influence maximization problem. Next, we provide the proof of our main result, Theorem \ref{mainthm}, in Section \ref{appdx:proof_of_mono}. Theorem \ref{mainthm} gives the theoretical guarantee of the \alg algorithm. Finally, in Section \ref{appdx:proof_of_mono2}, we provide the brief description of \algmono algorithm and the detailed proof of Theorem \ref{thm:monotone2}.


\subsection{Comparing to sampling-before-hand algorithm}
\label{appdx:compare_to_sample_before}
Before we describe the sampling-before-hand algorithm and dive into the comparison of this algorithm and \algmono, first we present a detailed description of the application of influence maximization. In the influence maximization problem in large-scale networks, the submodular objective is defined as follows:

{\textbf{Influence aximization }} Suppose the social graph is described by $G=(V,E,\bar{\vect{w}})$, where $V$ is the set of nodes with $|V|=n$, $E$ denotes the set of edges, and $\bar{\vect{w}}$ is the weight vector defined on the set of edges $E$. Given a seed set $S$, let us define $f(S;\vect{w})$ to be the number of nodes reachable from the seed set $S$ under the graph realizations determined by a random weight vector $\vect{w}$. Therefore, $f(S;\vect{w})$ is bounded by the number of nodes in the graph, i.e., $0\leq f(S;\vect{w})\leq n$. The submodular objective is defined as $f(S)=\mE_{\vect{w}\sim\mathcal{D}(\bar{\vect{w}})}f(S;\vect{w})$. Here $\mathcal{D}(\bar{\vect{w}})$ is the distribution of the weight vector.  

The marginal gain can be calculated as 
\begin{align*}
\Delta f(S,s)&=\mE_{\vect{w}\sim\mathcal{D}(\bar{\vect{w}})}\Delta f(S,s;\vect{w})\\
&=\mE_{\vect{w}\sim\mathcal{D}(\bar{\vect{w}})}f(S\cup\{u\};\vect{w})-\mE_{\vect{w}\sim\mathcal{D}(\bar{\vect{w}})} f(S),
\end{align*}
which is also bounded in the range of $[0,n]$.

Next, we describe the sampling-before-hand algorithm, which runs as follows:
\begin{enumerate}
    \item \textbf{Sampling:} The algorithm begins by sampling $N$ i.i.d graph realizations. For the $i$-th graph realization, we denote its weight vector as $\vect{w}_i$ and the corresponding function value for a set $S$ as $f_i(S)=f(S;\vect{w}_i)$.
    \item \textbf{Average objective Function:} Next, we define the average function $\hat{f}$ over the sampled graph realizations. This function is given by $\hat{f}(S)=\frac{\sum_{i=1}^Nf_i(S)}{N}$ for any $S\subseteq U$.
    \item \textbf{Threshold-greedy algorithm: }We run \thresholdlong (\threshold) with the average function $\hat{f}$ as the submodular objective. The output of the threshold-greedy algorithm is returned as the solution set, denoted as $S$.
\end{enumerate}

\subsubsection{Analysis of sampling-before-hand approach}
Now we present the analysis of the sampling-before-hand algorithm. From Lemma \ref{lem:chernoff}, and by taking the union bound, we can prove that 
\begin{align*}
    P(|\hat{f}(X)&-f(X)|\geq\alpha f(X)+\epsilon, \forall |X|\leq \kappa)\\
    &\leq2n^{\kappa}\exp\{-\frac{N\alpha\epsilon}{3n}\}.
\end{align*}
Therefore, to guarantee that $$P(|\hat{f}(X)-f(X)|\geq\alpha f(X)+\epsilon, \forall |X|\leq \kappa)\leq\delta,$$ it is enough to take 
$$N\in\Omega\big(\frac{n}{\alpha\epsilon}(\kappa\log n+\log\frac{1}{\delta})\big)$$ number of graph realizations. Since \threshold requires $\frac{n}{\alpha}\log\frac{n}{\alpha}$ number of evaluations of $\hat{f}$. The total number of evaluations of noisy realizations of $f$ would be 
\begin{align*}
    O\big(\frac{n^2}{\alpha^2\epsilon}\log\frac{n}{\alpha}(\kappa\log n+\log\frac{1}{\delta})\big).
\end{align*}
Next, we prove the approximation guarantee. From the analysis above, we can see that with probability at least $1-\delta$
\begin{align*}
   f(S)&\geq\frac{\hat{f}(S)-\epsilon}{1+\alpha} \\
   &\geq(1-\alpha)\hat{f}(S)-\epsilon \\
   &\geq (1-1/e-\alpha)(1-\alpha)\hat{f}(OPT)-\epsilon\\
   &\geq (1-1/e-2\alpha)\hat{f}(OPT)-\epsilon\\
   &\geq (1-1/e-3\alpha)f(OPT)-2\epsilon.
\end{align*}
Now we compare the theoretical guarantees of the sampling-based algorithm and \algmono. The theoretical results of \algmono are in Theorem \ref{thm:monotone2}. Notice that by substituting $\epsilon$ with $\epsilon/k$ in Theorem \ref{thm:monotone2}, we obtain a similar approximation guarantee for \algmono: $f(S)\geq (1-1/e-O(\alpha))f(OPT)-O(\epsilon)$, which matches the result achieved by the sampling-based algorithm. 

For the sample complexity, each call of \sampnew requires at most the minimum between
$O(\frac{\kappa n}{\epsilon\alpha}\log\frac{n}{\delta})$ and $O(\frac{n}{\alpha\phi'(S,u)}\log\frac{n}{\alpha\phi'(S,u)\delta})$ number of samples. The first bound is derived by considering the fixed $\epsilon$- approximation of the marginal gain. If we only consider this bound, then the total number of marginal gains would be $O(\frac{kn^2}{\epsilon\alpha^2}(\log\frac{n}{\alpha})(\log\frac{n}{\delta}))$. In practice, the parameter $\delta$ is usually set to be $O(Poly(1/n))$, such as $O(1/n^2)$. Consequently, the sample complexity of both \algmono and the sampling-before-hand approach would be $O(\frac{\kappa n}{\epsilon\alpha}\log n)$. However, it is important to note that \sampnew employs the adaptive thresholding technique, which often allows the algorithm to terminate much earlier before reaching the worst-case sample complexity required for fixed-confidence approximation. As a result, \algmono can be significantly more sample-efficient in practice.

In comparison to the sampling-before-hand algorithm, \algmono offers an additional advantage. The sampling-before-hand algorithm requires obtaining $N$ independent graph realizations and storing all the data at the beginning of the algorithm. However, this can pose practical challenges. Firstly, in scenarios where both $N$ and the graph are exceedingly large, storing all the data might be infeasible. Secondly, in certain applications, such as real-world social networks, obtaining an entire graph realization may not be possible, as we might only be able to sample a portion of the graph at each time.

\subsection{Proof of Theorem \ref{mainthm}}
\label{appdx:proof_of_mono}
In this section, we move towards proving one of our main results, Theorem \ref{mainthm} about \alg for the MSMC problem. We state the theorem again as follows.

\noindent\textbf{Theorem \ref{mainthm}. }\textit{
    Suppose the noisy marginal gain of any subset $S\subseteq U$ and element $s\in U$ is $R$-sub-Gaussian, then \alg makes at most $n\log(\kappa/\alpha)/\alpha$ calls of \samp. In addition, with probability at least $1-\delta$, the following statements hold:
    \begin{itemize}
    % [noitemsep]
        \item The exact function value of the output solution set $S$ satisfies that $f(S)\geq(1-e^{-1}-\alpha)f(OPT)-2\kappa\epsilon$;
    \item Each call of \samp on input ($w$, $\epsilon$, $\frac{2\delta}{3nh(\alpha)}$, $\mathcal{D}(S,u)$, $R$) takes at most the minimum between
    \begin{align*}
    % \label{eq:sam_complxt}
        \frac{8R^2}{\phi^2(S,u)}\log\left(\frac{16R^2\sqrt{\frac{3nh(\alpha)}{\delta}}}{\phi^2(S,u)}\right)
    \end{align*}
    and
    \begin{align*}
    % \label{eq:sam_complxt}
       \frac{2R^2}{\epsilon^2}\log \left(\frac{6nh(\alpha)}{\delta}\right)
    \end{align*}
    noisy samples. Here $OPT$ is an optimal solution to the MSMC problem, $\phi(S,u) = \frac{\epsilon + |w-\Delta f(S,u)|}{2}$, and $h(\alpha)=\frac{\log{(\kappa/\alpha)}}{\alpha}$.
    \end{itemize}
   % Besides, it holds with probability $1$ that 
    % where $\epsilon=3\epsilon$.
}

% The analysis of Theorem \ref{thm:monotone2} is similar to that of Theorem \ref{mainthm} and we defer the analysis to the appendix. 



To prove the theorem, we first present a series of needed lemmas. In order for the guarantees of Theorem \ref{mainthm} to hold, two random events must occur during \alg. The first event is that the estimate of the max singleton value of $f$ on Line \ref{alg:ATG:line:sample-mean} in \alg is an $\epsilon$-approximation of its true value. More formally, we have the following lemma.
% \begin{enumerate}
%     \item  $\mathcal{E}_1=\{\max_{s\in U}f(s)-\epsilon\leq d\leq\max_{s\in U}f(s)+\epsilon, \forall s\}$;
%     \item  $\mathcal{E}_2=\{|\widehat{\Delta f_t}(S,u)-\Delta f(S,u)|\leq \conf ,\forall t,S,u\}$;
%     \item  and $\mathcal{E}_3=\{|\widehat{\Delta f_{N_1}}(S,u)-\Delta f(S,u)|\leq \epsilon,\forall S,u\}$.
% \end{enumerate}
% % TODO: $C_t$ instead of input parameters?
% %where $N_1=R^2\log \frac{6nh(\alpha) t^2}{\delta}/(2\epsilon^2)$ and $h(\alpha)=\frac{\log{\kappa/\alpha}}{\alpha}$.
% %Before proving the result of Theorem \ref{mainthm}, we analyze the probability that each of these three events occurs in the following three Lemmas. The proof of each Lemma involves the application of Hoeffding's Inequality (Lemma \ref{hoeffding}). The three events correspond to Lemmas \ref{lem:clean_event_monotone}, \ref{lem:clean_event_all_time}, and \ref{lem:clean_event_fixed_size} respectively.
% We now show that each of these events holds with high probability. The analysis of the three events corresponds to Lemmas \ref{lem:clean_event_monotone}, \ref{lem:clean_event_all_time}, and \ref{lem:clean_event} respectively. We include proof of Lemma \ref{lem:clean_event_all_time} here in the main text, and the proofs of Lemmas \ref{lem:clean_event_monotone} and \ref{lem:clean_event_fixed_size} can be found in the supplementary material.
 \begin{lemma}
    \label{lem:clean_event_monotone}
      With probability at least $1-\delta/3$, we have $\max_{s\in U}f(s)-\epsilon\leq d\leq\max_{s\in U}f(s)+\epsilon$.
\end{lemma}
\begin{proof}
    For a fix $s\in U$, by Hoeffding's inequality we would have that
    \begin{align}
        P(|\hat{f}(s)-f(s)|\geq\epsilon)\leq\frac{\delta}{3n}.
    \end{align}
    Taking a union bound over all elements we would have that
    \begin{align*}
        P(\exists s\in U, s.t.|\hat{f}(s)-f(s)|\geq\epsilon )\leq\frac{\delta}{3}.
    \end{align*}
    Then with probability at least $1-\frac{\delta}{3}$, $|\hat{f}(s)-f(s)|\leq\epsilon$ 
    for all $s\in U$.  
    % Define $s^*=\arg\max_{s}f(s)$ and 
    % that $\hat{s}^*=\arg\max_{s}\hat{f}(s)$.
    It then follows that $\forall s\in U$, $f(s)-\epsilon\leq\hat{f}(s)\leq f(s)+\epsilon$. Therefore $$\max_{s\in U}(f(s)-\epsilon)\leq\max_{s\in U}\hat{f}(s)\leq\max_{s\in U}(f(s)+\epsilon).$$ Thus we have
    \begin{align*}
        \max_{s\in U}f(s)-\epsilon\leq d\leq\max_{s\in U}f(s)+\epsilon.
    \end{align*} 
    
    % and that
    % \begin{align*}
    %     d=\hat{f}(\hat{s}^*)\leq f(\hat{s}^*)+\epsilon\leq f(s^*)+\epsilon.
    % \end{align*}
\end{proof}

The second event is that for all calls of \samp, the result in Theorem \ref{thm:sampling} holds, which is stated formally as follows. 
 \begin{lemma}
 \label{lem:clean_event_call_to_CS_mono}
     With probability at least $1-2\delta/3$, we have that during each call of \samp with the solution set $S$ and element $u$, the output satisfies that if $thre$ is true, then $\Delta f(S,u)\geq w-\epsilon$. If $thre$ is false, then $\Delta f(S,u)\leq w+\epsilon$.
 \end{lemma}
 \begin{proof}
    First, since each sampling result of the marginal gain is assumed to be $R$-sub-Gaussian, by applying the result in Theorem \ref{thm:sampling}, we can prove that for each call of \samp during \alg with a fixed solution set $S$ and evaluated element $u$ as input, and with probability at least $1-\frac{2\delta}{3nh(\alpha)}$, if the output of \samp is true, then $\Delta f(S,u)\geq w-\epsilon$. Otherwise, $\Delta f(S,u)\leq w+\epsilon$. Since there are $n$ elements in the universe and the number of iterations in Algorithm \ref{alg:ATG} is bounded by $\frac{\log{\kappa/\alpha}}{\log(1/(1-\alpha))}\leq h(\alpha)$, there are at most $nh(\alpha)$ number of marginal gains to evaluate in Algorithm \ref{alg:ATG}. Therefore, by taking the union bound we have that with probability at least $1-2\delta/3$, the statement holds.
\end{proof}
  % The proof of Lemma \ref{lem:clean_event_monotone} and Lemma \ref{lem:clean_event_call_to_CS_mono} can be found in the supplementary material.
% \begin{lemma}
% \label{lem:clean_event_all_time}
%     With probability at least $1-\delta/3$, we have that for the solution set $S$ at any iteration of Algorithm \ref{alg:ATG}, and any element $s\in U$,
%     $|\deltafe-\Delta f(S,s)|\leq \conf \forall t\in\mathbb{N}_+$,
%     where $\conf:=R\sqrt{\frac{\log \frac{12nh(\alpha) t^2}{\delta}}{2t}}$ and $h(\alpha)=\frac{\log{(\kappa/\alpha)}}{\alpha}$.
% \end{lemma}
% \begin{proof}
%     First, we notice that by applying the Hoeffding's inequality, we have that for fixed $S$, $s$, and $t$,
%     \begin{align*}
%         P\left(|\deltafe-\Delta f(S,s)|\geq \conf \right)\leq \frac{\delta}{6nh(\alpha)t^2}.
%     \end{align*}
% By taking the union bound for fixed $S$ and $s$, it follows that
% \begin{align*}
%     &P(\exists t \text{ s.t. }|\deltafe-\Delta f(S,s)|\geq \conf )\\
%     &\leq\sum_{t=1}^\infty P(|\deltafe-\Delta f(S,s)|\geq \conf )\\
%     &\leq \frac{\delta}{6nh(\alpha)}\sum_{t=1}^\infty
%     \frac{1}{t^2}\leq\frac{\delta}{3nh(\alpha)}.
% \end{align*}
% Since there are $n$ elements in the universe and the number of iterations in Algorithm \ref{alg:ATG} is bounded by $\frac{\log{\kappa/\alpha}}{\log(1/(1-\alpha))}\leq h(\alpha)$, there are at most $nh(\alpha)$ number of marginal gains to evaluate in Algorithm \ref{alg:ATG}. Therefore, by taking the union bound we have
% $P(\exists S,s,t \text{ s.t. } |\deltafe-\Delta f(S,s)|\geq \conf )\leq \delta/3$.
% \end{proof}

% \begin{lemma}
% \label{lem:clean_event_fixed_size}
%     With probability at least $1-\delta/3$, we have that for the solution set $S$ at any iteration of Algorithm \ref{alg:ATG} and any element $s\in U$ $|\widehat{\Delta f_{N_1}}(S,s)-\Delta f(S,s)|\leq \epsilon \forall t\in\mathbb{N}_+$,
%     where $N_1:=\frac{R^2\log \frac{6nh(\alpha)}{\delta}}{2\epsilon^2}$ and $h(\alpha)=\frac{\log{(\kappa/\alpha)}}{\alpha}$.
% \end{lemma}



With the above Lemma \ref{lem:clean_event_monotone}  and Lemma \ref{lem:clean_event_call_to_CS_mono}, and by taking the union bound, we have that with probability at least $1-\delta$, the two events both hold during the \alg. Our next step is to show that if both of the events occur during \alg, the approximation guarantees and sample complexity of Theorem \ref{mainthm} hold. To this end, we need the following Lemma \ref{lem:mar_gain}. 
% The detailed proof is deferred to the appendix.[TODO: add details] 
% prove Lemma \ref{lem:mar_gain} using Lemma \ref{lem:clean_event_monotone} and Lemma \ref{lem:clean_event_call_to_CS_mono}. Lemma \ref{lem:mar_gain} gives a lower bound on the marginal gain of $f$ each time we add an element to our solution $S$ in \alg, provided that the events hold.
% \begin{lemma}
% \label{lem:clean_event}
%     Assume events $\mathcal{E}_1$, $\mathcal{E}_2$, and $\mathcal{E}_3$ defined above hold during \alg.
%     Then at any point during \alg, if $u$ is added to $S$, $\Delta f(S,u)\geq w-\epsilon$. If at any point during \alg, $u$ is not added to $S$, then $\Delta f(S,u)\leq w+\epsilon$.
% \end{lemma}
% \begin{proof}
%      If $t=N_1$ when \samp ends, then if element $s$ is added, we have $\deltafe \geq w$. Conditioned on event $\mathcal{E}_3$, $\Delta f(S,s)\geq \deltafe - \epsilon\geq w-\epsilon$. If the element $u$ is not added, then $\deltafe \leq w$. Similarly we have that $\Delta f(S,s)\leq \deltafe + \epsilon\leq w+\epsilon$. Secondly, let us consider the case where $t<N_1$ when the algorithm \samp ends. Conditioned on event $\mathcal{E}_2$, we have if $u$ is added, $\Delta f(S,s)\geq\deltafe-\conf\geq w-\epsilon$. If $u$ is not added, $\Delta f(S,s)\leq\deltafe+\conf\leq w+\epsilon$.
% \end{proof}
% It then follows that
% \begin{lemma}
%     If the output of TAMG($w$, $\epsilon$, $\delta$) is $thre=true$, 
% \end{lemma}
% $$\mathcal{E}=\{|\deltafe-\Delta f(S,s)|\geq \conf ,\forall t,S,s\}.$$

\begin{lemma}
\label{lem:mar_gain}
    Assume the events defined in Lemma \ref{lem:clean_event_monotone} and Lemma \ref{lem:clean_event_call_to_CS_mono} above hold during \alg. Then for any element $s$ that is added to the solution set $S$, the following statement holds.
    \begin{align*}
        \Delta f(S,s)\geq \frac{1-\alpha}{\kappa}(f(OPT)-f(S))-2\epsilon.
    \end{align*}
\end{lemma}


% The final lemma needed to prove Theorem \ref{mainthm} concerns the number of samples that \samp takes in order to make a decision about adding an element to $S$. The number of samples depends on how far away the true value of $f$ is from the threshold. In particular, Lemma \ref{lem:conf_int} below states that once the confidence interval goes beneath the corresponding $\phi$ value (as defined in Theorem \ref{mainthm}), then \samp will complete. We next analyze how many samples this should take in Theorem \ref{mainthm}.

% \begin{lemma}
% \label{lem:conf_int}
%     For the solution set $S$ at any iteration of Algorithm \ref{alg:ATG} and any element $s\in U$, when the confidence interval satisfies that
%     \begin{align*}
%         \conf \leq \phi(S,s),
%     \end{align*}
%     the sampling of $\Delta f(S,s)$ finishes, where $\phi(S,s) = \frac{\epsilon + |w-\Delta f(S,s)|}{2}$.
    
% \end{lemma}
% \begin{proof}
%     % First, we prove that when $\conf\leq \epsilon$ the Algorithm \ref{alg:TAMG} ends. Notice that when $\conf\leq \epsilon$, it holds that $w-\epsilon+\conf\leq w+\epsilon-\conf$. Then one of the following statements must be true.
%     % $$\deltafe\geq w-\epsilon+\conf$$ and $$\deltafe\leq w+\epsilon-\conf.$$
%     % Therefore, the algorithm ends. Second, we consider the case where $\conf\leq\frac{\epsilon+w-\Delta f(S,s)}{2}$. In this case, we have that 
%     % \begin{align*}
%     %     \Delta f(S,s)+2\conf\leq w+\epsilon
%     % \end{align*}
%     % Notice that conditioned on the clean event defined in Lemma \ref{lem:clean_event_all_time}, we have that $\deltafe\leq\Delta f(S,s)+\conf$. Then
%     % \begin{align*}
%     %     \deltafe+\conf\leq w+\epsilon.
%     % \end{align*}
%     % Thus the algorithm ends.
%      % If $t>N_1$, then from Alg \ref{alg:TAMG}, the algorithm ends. If $t\leq N_1$, 
%      If $\conf\leq\frac{\epsilon+w-\Delta f(S,s)}{2}$, then we have $\Delta f(S,s)\leq w+\epsilon-2\conf $. From Lemma \ref{lem:clean_event}, we have that 
%      \begin{align*}
%          &\deltafe+\conf\\
%          &\leq (\deltafe-\Delta f(S,s))+\Delta f(S,s)+\conf\\
%          &\leq w+\epsilon.
%      \end{align*}
%      Thus the algorithm ends.
     
%      % Since $t\leq N_1$, we have that $\conf\geq \epsilon$, therefore it holds that 
%      Similarly, we consider the case where $\conf\leq\frac{\epsilon-w+\Delta f(S,s)}{2}$. In this case, we have that $\Delta f(S,s)\geq 2\conf+w-\epsilon$.
%     Notice that conditioned on the clean event defined in Lemma \ref{lem:clean_event_all_time}, we have that $\deltafe-\Delta f(S,s)\geq-\conf$. Then
%     \begin{align*}
%         \deltafe-\conf&\geq \deltafe-\Delta f(S,s)\\
%         &\qquad+\Delta f(S,s)-\conf\\
%         &\geq-\conf+2\conf\\
%         &\qquad+w-\epsilon-\conf\\
%         &=w-\epsilon.
%     \end{align*}
%     Therefore, the algorithm ends. 
% \end{proof}
\begin{proof}   
    At the first iteration, if an element $s$ is added to the solution set, it holds by Lemma \ref{lem:clean_event_monotone} that $\Delta f(S,s)\geq w-\epsilon$.
   Since at the first iteration $w=d$ and $d\geq \max_{s\in U}f(s)-\epsilon$. It follows that $\Delta f(S,s)\geq \max_{s\in U}f(s)-2\epsilon$.
    By submodularity we have that $\kappa\max_{s\in U}f(s)\geq f(OPT)$. Therefore, $\Delta f(S,s)\geq \frac{f(OPT)-f(S)}{\kappa}-2\epsilon$.
    
    At iteration $i$ where $i>1$, if an element $o\in OPT$ is not added to the solution set, then it is not added to the solution at the last iteration, where the threshold is $\frac{w}{1-\alpha}$. By Lemma \ref{lem:clean_event}, we have
    $\Delta f(S,o)\leq \frac{w}{1-\alpha}+\epsilon$.
    % By Lemma \ref{lem:clean_event_all_time}, we have that 
    % $$\Delta f(S,o)\leq \widehat{\Delta f_t(S,o)}+\conf \leq \frac{w}{1-\alpha}+\epsilon.$$ 
    Since for any element $s$ that is added to the solution at iteration $i$, by Lemma \ref{lem:clean_event} it holds that $\Delta f(S,s)\geq w-\epsilon$. Therefore, we have
    \begin{align*}
        \Delta f(S,s)&\geq w-\epsilon \\
        &\geq(1-\alpha)(\Delta f(S,o)-\epsilon)-\epsilon\\
        &\geq (1-\alpha)\Delta f(S,o)-2\epsilon. 
    \end{align*}
    By submodularity, it holds that $\Delta f(S,s)\geq (1-\alpha)\frac{f(OPT)-f(S)}{\kappa}-2\epsilon$.
\end{proof}
We now prove the main result, Theorem \ref{mainthm}, which relies on the previous Lemma \ref{lem:clean_event_monotone}, \ref{lem:clean_event_call_to_CS_mono} and \ref{lem:mar_gain}.
\begin{proof}
    The events defined in Lemma \ref{lem:clean_event_monotone}, \ref{lem:clean_event_call_to_CS_mono} hold with probability at least $1-\delta$ by combining Lemma \ref{lem:clean_event_monotone}, \ref{lem:clean_event_call_to_CS_mono}, and taking the union bound. Therefore in order to prove Theorem \ref{mainthm}, we assume that both the two events have occurred. 
    The proof of the first result in the theorem depends on the Lemma \ref{lem:mar_gain}. First, consider the case where the output solution set satisfies $|S|=\kappa$. Denote the solution set $S$ after the $i$-th element is added as $S_i$. Then by Lemma \ref{lem:mar_gain}, we have
    \begin{align*}
        f(S_{i+1})\geq \frac{1-\alpha}{\kappa}f(OPT)+(1-\frac{1-\alpha}{\kappa})f(S_i)-2\epsilon.
    \end{align*}
By induction, we have that
\begin{align*}
    f(S_{\kappa})&\geq(1-(1-\frac{1-\alpha}{\kappa})^k)\{f(OPT)-\frac{2\kappa\epsilon}{1-\alpha}\}\\
    &\geq (1-e^{-1+\alpha})\{f(OPT)-\frac{2\kappa\epsilon}{1-\alpha}\}\\
    &\geq(1-e^{-1}-\alpha)\{f(OPT)-\frac{2\kappa\epsilon}{1-\alpha}\}\\
    &\geq(1-e^{-1}-\alpha)f(OPT)-2\kappa\epsilon.
\end{align*}
 If the size of the output solution set $S$ is smaller than $\kappa$, then any element $o\in OPT$ that is not added to $S$ at the last iteration satisfies that
$\Delta f(S,o)\leq w+\epsilon$.
Since the threshold $w$ in the last iteration satisfies that $w\leq\frac{\alpha d}{\kappa}$, we have
\begin{align*}
    \Delta f(S,o)&\leq\frac{\alpha d}{\kappa}+\epsilon.
\end{align*}
It follows that
\begin{align*}
    \sum_{o\in OPT\backslash S}\Delta f(S,o)
    &\leq\alpha (\max_{s\in S}f(s)+\epsilon)+\kappa\epsilon\\
    &\leq\alpha f(OPT)+2\kappa\epsilon.
\end{align*}
% where the last inequality comes from the fact that  $\epsilon\leq\frac{1}{3}$. This is because if $\epsilon\geq\frac{1}{3}$, the bound in the theorem is trivial.
 By submodularity and monotonicity of $f$, we have $f(S)\geq (1-\alpha)f(OPT)-2\kappa\epsilon$.



\end{proof}

% \begin{multicols}{2}
% \begin{center}
% \includegraphics[width=2\linewidth]{figures/sample_complexity.pdf}
%   \captionof{figure}{a222}
% \end{center}

% \end{multicols}

% With the above lemma, we can present the bound on query complexity, which is one of the main results in the paper.
%\begin{theorem}
%\label{thm:sample_complexity}
%    With probability at least $1-\delta$, the number of queries required by any call of \samp throughout \alg is upper-bounded by
%    \begin{align*}
%        \min\left(\frac{2R^2}{\phi^2(S,s)}\log\left(\frac{4R^2\sqrt{\frac{3nh(\alpha)}{\delta}}}{\phi^2(S,s)}\right),\frac{R^2}{2\epsilon^2}\log \left(\frac{6nh(\alpha)}{\delta}\right)\right).
%    \end{align*}
%\end{theorem}

%By combining the required number of queries for all the marginal gains evaluated in Algorithm \ref{alg:ATG}, we have that
%\begin{corollary}
 %   With probability at least $1-\delta$, the number of queries required by the Algorithm \ref{alg:ATG} is upper-bounded by
 %   \begin{align*}
 %       \sum_{i=1}^{T}\sum_{s\in U}\min\{\frac{2R^2}{g_i^2(s)}\log(\frac{4R^2\sqrt{\frac{3nh(\alpha)}{\delta}}}{g_i^2(s)}),\frac{R^2}{2\epsilon^2}\log \frac{6nh(\alpha)}{\delta}\}
 %   \end{align*}
 %   where $g_i(s)=\frac{\epsilon+|w_i-\Delta f(S_{i,s},s)|}{2}$. Here $w_i$ is the threshold at the $i$-th iteration. $S_{i,s}$ is the solution set when
 %   before the algorithm processing element $s$ at iteration $i$. $T$ is the total number of iterations.
%\end{corollary}


%\subsection{Comparison to Previous Result}
%\label{sec:compare to previous result}

% \begin{corollary}
%     With probability at least $1-\delta$, we have that the number of queries is upper bounded by 
%     \begin{align*}
%         O(\frac{nR^2}{\epsilon g^2}(\log\frac{n}{\epsilon})(\log\frac{R}{g\delta})).
%     \end{align*}
%     where $g=\min_{i\in[T],s\in U}g_i(s)$.
% \end{corollary}



% \noindent\textbf{Lemma \ref{lem:clean_event_call_to_CS_mono}. }\textit{
%      With probability at least $1-2\delta/3$, we have that during each call of \samp with the solution set $S$ and element $u$, the output satisfies that if $thre$ is true, then $\Delta f(S,u)\geq w-\epsilon$. If $thre$ is false, then $\Delta f(S,u)\leq w+\epsilon$.
% }



\subsection{Analysis of \algmono}
\label{appdx:proof_of_mono2}
In this section, we analyze Theorem \ref{thm:monotone2}, which establishes the sample complexity and approximation ratio guarantees for the solution obtained by \alglongmono (\algmono). \algmono is an algorithm for the MSMC problem where only noisy queries to $\Delta f$ are available. The corresponding algorithm description is presented in Algorithm \ref{alg:ATG2}. 


First of all, we give a brief description of the \algmono algorithm. \algmono shares a similar idea with the \alg algorithm presented in Section \ref{sec:monotone}. Both of the two algorithms utilize \samp to determine if the expectation of the evaluated marginal gain is approximately above a threshold $w$. However, they differ in their error approximation guarantees on the expectation of evaluated marginal gain. Specifically, \alg invokes the \samplong procedure (\samp) with the following inputs: threshold $w$, approximation error bound $\epsilon$, error probability $\frac{2\delta}{3nh'(\alpha)}$ where $h'(\alpha)=\frac{3\log{(3\kappa/\alpha)}}{\alpha}$, random distribution $\mathcal{D}(S,u)$, and upper bound of the noisy marginal gain $R$ as input. Different from the subroutine algorithm \samp in \alg, the worst-case query complexity $N_1$ and confidence interval $C_t$ in \samp are defined as in Theorem \ref{thm:sampling2} with the multiplicative input parameter set to $\alpha/3$.
Therefore, the output of \samp in \algmono satisfies that with high probability, if the output is true, then $(1+\alpha/3)\Delta f(S,u)\geq w-\epsilon$. If the output is false, then $(1-\alpha/3)\Delta f(S,u)\leq w+\epsilon$.

% The proof relies on the properties of the \sampnew subroutine, which is used by \algmono to adaptively determine the number of noisy queries required for each marginal gain evaluation. 

Next, we present the analysis of Theorem \ref{thm:monotone2}.


\textbf{Theorem \ref{thm:monotone2}. }
   \textit{
    Suppose the noisy marginal gain of any subset $S\subseteq U$ and element $s\in U$ is bounded in $[0,R]$, \algmono makes at most $3n\log(\kappa/\alpha)/\alpha$ calls of \samp. In addition, with probability at least $1-\delta$, the following statements hold:
    \begin{itemize}[noitemsep]
        \item The exact function value of the output solution set $S$ satisfies that $f(S)\geq(1-e^{-1}-\alpha)f(OPT)-2\kappa\epsilon$;
    \item Each call of \samp on input ($w$, $\epsilon$, $\frac{2\delta}{3nh'(\alpha)}$, $\mathcal{D}(S,u)$, $R$) takes at most the minimum between
    \begin{align*}
\frac{9R}{\epsilon\alpha}\log \left(\frac{6nh'(\alpha)}{\delta}\right)
    \end{align*}
    and 
    \begin{align*}
    % \label{eq:sam_complxt}
       \frac{36R}{\alpha\phi'(S,u)}\log\left(\frac{36R}{\alpha\phi'(S,u)}\sqrt{\frac{12nh'(\alpha)}{\delta}}\right)
    \end{align*}
    noisy samples. Here $OPT$ is an optimal solution to the MSMC problem, $\phi'(S,u) = \frac{\epsilon -\alpha\Delta f(S,u)/3+ |w-\Delta f(S,u)|}{2}$, and $h'(\alpha)=\frac{3}{\alpha}\log{(\frac{3\kappa}{\alpha})}$.
    \end{itemize}
   % Besides, it holds with probability $1$ that 
    % where $\epsilon=3\epsilon$.
}

% We refer the reader again to Figure \ref{fig:samp} in order to see a depiction of these different cases for \samp.

%\begin{theorem}
%    \label{mainthm}
 %   With probability at least $1-\delta$, the output solution set $S$ of \alg (Algorithm \ref{alg:ATG}) satisfies that
 %   \begin{align*}
 %       f(S)\geq(1-e^{-1}-\alpha)f(OPT)-6\kappa\epsilon.
 %   \end{align*}
    % where $\epsilon=3\epsilon$.
 %   Further, \alg makes TODO calls of \samp, and each call of \samp on input $(w,\epsilon,\delta,S,s)$ takes TODO noisy samples.
%\end{theorem}



Now we present the proof of Theorem \ref{thm:monotone2}. The organization of the proof for Theorem \ref{thm:monotone2} is as follows: we begin by presenting the proof of the Theorem \ref{thm:monotone2}. Then the proofs of two lemmas, Lemma \ref{lem:clean_event_call_to_CS_mono2} and Lemma \ref{lem:mono_iterative_result}, that are used in the proof of Theorem \ref{thm:monotone2} are presented.
\label{appdx:mono2}

\begin{algorithm}[t]
\caption{\alglongmono (\algmono)}\label{alg:ATG2}
 \begin{algorithmic}[1]
 \STATE \textbf{Input:} $\epsilon$, $\delta, \alpha$
 % \STATE  Define $\epsilon':=\frac{\epsilon}{6\kappa}$
 \STATE $N_3\gets \frac{9R}{\epsilon\alpha}\log\frac{6n}{\delta}$
 \FORALL{$s\in U$}
 \STATE $\hat{f}_{N_3}(s) \gets $ sample mean over $N_3$ samples from $\mathcal{D}(\emptyset,s)$ \label{alg:ATG:line:sample-mean2}
 % \STATE \label{line:BAI}  repeatedly sample $f(s)$ for $N_1:=\frac{R^2\log(6n/\delta)}{2\epsilon^2}$ number of times, define the empirical mean $\hat{f}(s)=\frac{\sum_{i=1}^{N_1}f_i(s)}{N_1}$.
 \ENDFOR
  
  \STATE $d:=\max_{s\in U}\hat{f}_{N_3}(s)$, 
 % using BAI to approximate $\max_{s\in S}f(s)$, i.e., $(1+\epsilon)\max_{s\in U}f(s)\geq d\geq (1-\epsilon)\max_{s\in U}f(s)$.
 \STATE $w\gets d$, $S\gets \emptyset$
 \WHILE{$w> \frac{\alpha d}{3\kappa}$}
 \FORALL{$u\in U$} \label{line:algmono_loop_start}
\IF{$|S|<\kappa$}
 \STATE thre = \samplong($w$, $\epsilon$, $\frac{2\delta}{3nh'(\alpha)}$, $\mathcal{D}(S,u)$, $R$)
 \IF{thre}
 \STATE $S\gets S\cup \{u\}$
 \ENDIF
  \ENDIF
 \ENDFOR

 \STATE $w=w(1-\alpha/3)$
 \label{line:algmono_loop_end}
 \ENDWHILE
 \STATE \textbf{return} $S$
 \end{algorithmic}
\end{algorithm}


    \begin{proof}
        First, since the number of iterations in the while loop from Line \ref{line:algmono_loop_start} to Line \ref{line:algmono_loop_end} in \algmono (see Algorithm \ref{alg:ATG2}) is upper bounded by $\frac{3}{\alpha}\log\frac{3\kappa}{\alpha}$, \algmono makes at most $\frac{3n}{\alpha}\log\frac{3\kappa}{\alpha}$ calls of \sampnew. Next, we prove the second result in Theorem \ref{thm:monotone2}, which guarantees the upper bound on the required number of samples. By applying Lemma \ref{lem:clean_event_call_to_CS_mono2} on the sampling of the noisy marginal gain of $\Delta f(S,u)$, we can see that with probability at least $1-\delta$, for each call of \sampnew, we have that the number of noisy queries is bounded by the minimum between 
        $\frac{9R}{\epsilon\alpha}\log \left(\frac{6nh'(\alpha)}{\delta}\right)$ and $\frac{36R}{\alpha\phi'(S,u)}\log\left(\frac{36R}{\alpha\phi'(S,u)}\sqrt{\frac{12nh'(\alpha)}{\delta}}\right)$.
        
        Now we prove the first result. Since the proof of the first result is similar to the proof of Theorem \ref{mainthm}, here we provide a proof sketch and omit the details. First of all, by Lemma \ref{lem:mono_iterative_result}, we have
        \begin{align*}
             f(S_{i+1})\geq \frac{1-\alpha}{\kappa}f(OPT)+(1-\frac{1-\alpha}{\kappa})f(S_i)-2\epsilon.
        \end{align*}
        Let us denote the solution set $S$ after the $i$-th element is added as $S_i$. Notice that the result in Lemma \ref{lem:mar_gain} is the same as Lemma \ref{lem:mono_iterative_result}. Therefore, following the same proof as that in Theorem \ref{mainthm}, we would get that if $|S|=\kappa$, then by induction
        \begin{align*}
    f(S_{\kappa})
    &\geq(1-e^{-1}-\alpha)f(OPT)-2\kappa\epsilon.
\end{align*}
If the size of the output solution set $S$ is smaller than $\kappa$, then any element $o\in OPT$ that is not added to $S$ at the last iteration satisfies that
$(1-\alpha/3)\Delta f(S,o)\leq w+\epsilon$. Since at the last iteration $w\leq\frac{\alpha d}{3\kappa}$, and that conditioned on the events in Lemma \ref{lem:clean_event_call_to_CS_mono2}, $d\leq(1+\alpha/3)\max_{s\in U}f(s)+\epsilon$, it follows that 
\begin{align*}
    (1-\alpha/3)\Delta f(S,o)\leq \frac{\alpha }{3\kappa}\{(1+\alpha/3)\max_{s\in U}f(s)+\epsilon\}+\epsilon
\end{align*}
By submodularity and monotonicity of $f$, we have
\begin{align*}
    f(OPT)-f(S)&\leq\sum_{o\in OPT}\Delta f(S,o)\\
    &\leq \frac{\alpha }{3(1-\alpha/3)}\{(1+\alpha/3)\max_{s\in U}f(s)+\epsilon\}\\
    &\qquad+\frac{\kappa\epsilon}{(1-\alpha/3)}\\
    &\leq\alpha\max_{s\in U}f(s)+2\kappa\epsilon\\
    &\leq \alpha f(OPT)+2\kappa\epsilon.
\end{align*}
Then we have $f(S)\geq (1-\alpha)f(OPT)-2\kappa\epsilon$.
    \end{proof}
    The proof of the above Theorem \ref{thm:monotone2} depends on Lemma \ref{lem:mono_iterative_result}. Before proving Lemma \ref{lem:mono_iterative_result}, we first prove the Lemma \ref{lem:clean_event_call_to_CS_mono2}.
 \begin{lemma}
 \label{lem:clean_event_call_to_CS_mono2}
     With probability at least $1-\delta$, the following two events hold.
     \begin{enumerate}
     \item $(1-\alpha/3)\max_{s\in U}f(s)-\epsilon\leq d\leq(1+\alpha/3)\max_{s\in U}f(s)+\epsilon$.
         \item During each call of \sampnew on input ($w$, $\epsilon$, $\frac{2\delta}{3nh'(\alpha)}$, $\mathcal{D}(S,u)$, $R$), if the output is true, then $(1+\alpha/3)\Delta f(S,u)\geq w-\epsilon$. If the output is false, then $(1-\alpha/3)\Delta f(S,u)\leq w+\epsilon$. In addition, the number of samples taken by \sampnew is at most the minimum between
         \begin{align}
    \label{eq:sam_complxt1}
       \frac{9R}{\epsilon\alpha}\log \left(\frac{6nh'(\alpha)}{\delta}\right)
    \end{align}
    and 
    \begin{align}
    \label{eqn:sam_complx2}
        \frac{36R}{\alpha\phi'(S,u)}\log\left(\frac{36R}{\alpha\phi'(S,u)}\sqrt{\frac{12nh'(\alpha)}{\delta}}\right),
    \end{align}
    
    where $\phi'(S,u) = \frac{\epsilon -\alpha\Delta f(S,u)/3+ |w-\Delta f(S,u)|}{2}$, and $h'(\alpha)=\frac{3}{\alpha}\log{(\frac{3\kappa}{\alpha})}$.
     \end{enumerate}
 \end{lemma}
 
 \begin{proof}
      First of all, by applying the inequality in Lemma \ref{lem:chernoff}, we have that for
      fixed element $s\in U$
      \begin{align*}
          P\big(|\hat{f}_{N_3}(s)-f(s)|\geq\frac{\alpha}{3}f(s)+\epsilon\big)\leq\frac{\delta}{3n}.
      \end{align*}
      Taking a union bound over all elements in $U$, it follows that 
      \begin{align*}
          P\big(|\hat{f}_{N_3}(s)-f(s)|\geq\frac{\alpha}{3}f(s)+\epsilon,\forall s\in U\big)\leq\frac{\delta}{3}, 
      \end{align*}
      where $N_3=\frac{9R}{\epsilon\alpha}\log\frac{6n}{\delta}$. 
      Therefore, with probability at least $1-\delta/3$, we have $|\hat{f}_{N_3}(s)-f(s)|\leq\frac{\alpha}{3}f(s)+\epsilon$ for each $s\in U$. Denote $s_1=\arg\max_{s\in U}\hat{f}_{N_3}(s)$ and $s_2=\arg\max_{s\in U}{f}(s)$. It follows that with probability at least $1-\delta/3$, we have that 
      \begin{align*}
          d=\hat{f}_{N_3}(s_1)\leq (1+\alpha/3)f(s_1)+\epsilon\leq(1+\alpha/3)f(s_2)+\epsilon,
      \end{align*}
      and that 
      \begin{align*}
          d=\hat{f}_{N_3}(s_1)\geq \hat{f}_{N_3}(s_2)\geq(1-\alpha/3)f(s_2)-\epsilon.
      \end{align*}
      Since $d=\max_{s\in U}\hat{f}_{N_3}(s)=\hat{f}_{N_3}(s_1)$ and $f(s_2)=\max_{s\in U}f(s)$, the first result holds with probability at least $1-\delta/3$. 

      
      Next, we prove the second result. For
      each call of the sampling algorithm \sampnew with fixed input ($w$, $\epsilon$, $\frac{2\delta}{3nh'(\alpha)}$, $\mathcal{D}(S,u)$, $R$), and given that $N_1$ and $C_t$ are defined in accordance with Theorem \ref{thm:sampling2} with the multiplicative error parameter set to $\alpha/3$, we can leverage the second result in Theorem \ref{thm:sampling2}. Consequently, with  probability at least $1-\frac{2\delta}{3nh'(\alpha)}$, the following two things hold: 
      \begin{enumerate}
          \item If the output of \sampnew is true, then $(1+\alpha/3)\Delta f(S,s)\geq w-\epsilon$. If the output is false, then $(1-\alpha/3)\Delta f(S,s)\leq w+\epsilon$.
          \item The number of noisy queries is bounded by the minimum between (\ref{eq:sam_complxt1}) and (\ref{eqn:sam_complx2}) in the lemma.
      \end{enumerate}
       Since there are at most $\frac{\log(3\kappa/\alpha)}{\log\frac{1}{1-\alpha/3}}\leq h'(\alpha)=\frac{3}{\alpha}\log\frac{3\kappa}{\alpha}$ number of iterations in \algmono, there are at most $nh'(\alpha)$ calls of \sampnew. Therefore, by taking the union bound we have that with probability at least $1-2\delta/3$, the two events defined above hold for all calls to \sampnew during \algmono. By taking the union bound again, we have that with probability at least $1-\delta$, the two results in the lemma both hold.
    \end{proof}
    Now we prove the Lemma \ref{lem:mono_iterative_result}.
    
    \begin{lemma}
        \label{lem:mono_iterative_result}
         Assume the events defined in Lemma \ref{lem:clean_event_call_to_CS_mono2} hold during \algmono. Then for any element $s$ that is added to the solution set $S$, the following statement holds.
    \begin{align*}
        \Delta f(S,s)\geq \frac{1-\alpha}{\kappa}(f(OPT)-f(S))-2\epsilon.
    \end{align*}
    \end{lemma}
    \begin{proof}   
    At the first iteration, if an element $s$ is added to the solution set, it holds by Lemma \ref{lem:clean_event_call_to_CS_mono2} that $(1+\frac{\alpha}{3})\Delta f(S,s)\geq w-\epsilon$.
   Since at the first iteration $w=d$ and $d\geq (1-\alpha/3)\max_{s\in U}f(s)-\epsilon$. It follows that $\Delta f(S,s)\geq \frac{1-\alpha/3}{1+\alpha/3}\max_{s\in U}f(s)-\frac{2\epsilon}{1+\alpha/3}\geq(1-\alpha)\max_{s\in U}f(s)-2\epsilon$.
    By submodularity we have that $\kappa\max_{s\in U}f(s)\geq f(OPT)$. Therefore, $\Delta f(S,s)\geq \frac{1-\alpha}{\kappa}(f(OPT)-f(S))-2\epsilon$.
    
    At iteration $i$ where $i>1$, if an element $o\in OPT$ is not added to the solution set, then it is not added to the solution set at the last iteration, where the threshold is $\frac{w}{1-\alpha/3}$. By Lemma \ref{lem:clean_event_call_to_CS_mono2}, we have
    $(1-\alpha/3)\Delta f(S,o)\leq \frac{w}{1-\alpha/3}+\epsilon$.
    % By Lemma \ref{lem:clean_event_all_time}, we have that 
    % $$\Delta f(S,o)\leq \widehat{\Delta f_t(S,o)}+\conf \leq \frac{w}{1-\alpha}+\epsilon.$$ 
    For any element $s$ that is added to the solution at iteration $i$, by Lemma \ref{lem:clean_event_call_to_CS_mono2} it holds that $(1+\alpha/3)\Delta f(S,s)\geq w-\epsilon$. Therefore, we have
    \begin{align*}
        \Delta f(S,s)&\geq \frac{w-\epsilon}{1+\alpha/3} \\
        &\geq\frac{(1-\alpha/3)^2\Delta f(S,o)-(1-\alpha/3)\epsilon-\epsilon}{1+\alpha/3} \\
        &\geq (1-\alpha)\Delta f(S,o)-2\epsilon. 
    \end{align*}
    By submodularity, it holds that $\Delta f(S,s)\geq (1-\alpha)\frac{f(OPT)-f(S)}{\kappa}-2\epsilon$.
\end{proof}





