\section{Preliminary Definitions and Notations}
\label{sec:prelim}
In this section, we lay the groundwork definitions and notations for the remainder of the paper.
Throughout this paper, we assume $f:2^U\to\mathbb{R}_{\geq 0}$ is submodular. $U$ is the ground set of size $n$.
Let us denote the marginal gain of adding element $u\in U$ to a set $X\subseteq U$ as $\Delta f(X,u)$, i.e., $\Delta f(X,u):=f(X\cup\{u\})- f(X)$.

We first define the noisy model of access to $f$. In particular, given any subset $X\subseteq U$ and $u\in U$, independent samples can be taken from the distribution $\mathcal{D}(X,u)$ to obtain noisy evaluations of $\Delta f(X,u)$. In this paper, we denote the random variable following the distribution of $\mathcal{D}(X,u)$ as $\widetilde{\Delta f}(X,u)$. We assume the following properties about the distribution $\mathcal{D}(X,u)$: (i) $\mathbb{E} [\widetilde{\Delta f}(X,u)]=\Delta f(X,u)$; and (ii) $\widetilde{\Delta f}(X,u)$ are bounded in the range of $[0, R]$ for all $X,u$ (or in some results, they are assumed to be $R$-sub-Gaussian).\footnote{ A random variable that is bounded within the interval $[0,R]$ can be demonstrated to be $R/2$ sub-Gaussian. Consequently, the assumption of a random variable being sub-Gaussian is more general than that of boundedness.} In addition, in applications where instead we have noisy queries directly to $f$ instead of the marginal gain, this also satisfies our setting (see Section \ref{appdx:related_work} in the appendix of the supplementary material for more details).
Below we describe three 
 motivating examples of our noisy setting and illustrate the value of $R$
 on these instances.

%\begin{enumerate}
    \textbf{Diversified recommender systems with human feedback.} The goal here is to select a subset of items to recommend to users. The objective function is the total number of expected clicks by the users, typically defined by the cascading linear submodular bandit model \citep{hiranandani2020cascading}. In this setting, the objective function is computed in expectation and can only be estimated through noisy feedback from the users. A noisy sample corresponds to querying a person for feedback, and samples are i.i.d. The maximum value of feedback is then bounded by $1$. Therefore, $R$ 
 can be set to be $1/2$
 for Theorem \ref{thm:sampling} and $1$
 for Theorem \ref{thm:sampling2}.
    
    \textbf{Multi-linear extension.}  This setting specifically applies to our Algorithm CCTG, which is our continuous algorithm that uses the multilinear extension of $f$
 to achieve an improved approximation guarantee for the matroid constraint. The multilinear extension is commonly used in submodular optimization algorithms, and is defined as $\textbf{F}(\textbf{x})=\sum_{S\subseteq U}\prod_{i\in S}x_i\prod_{j\notin S}(1-x_j)f(S)$ where $\textbf{x}\in[0,1]^n$. Notice that obtaining the true value of the multi-linear extension requires an exponential number of queries, therefore the proposed algorithms often require sampling to approximate function values. Noisy queries for the true value of the multilinear extension can be obtained by taking i.i.d. samples of sets. On this instance, the noisy marginal gain is bounded by the maximum singleton value, so we can set $R$ to be $\max_{s\in U}f(s)$.
    
    \textbf{Stochastic submodular maximization.} The objective function of the stochastic submodular maximization (SSM) problem can be expressed as $f(S)=\mE_{\gamma}[f_{\gamma}(S)]$. To solve this problem, we would need to approximate the function value $f$ by taking samples of $f_{\gamma}(S)$ from the distribution of $\gamma$. 
    A specific application of this problem is the influence maximization problem, where the objective function is the expected number of nodes influenced in the graph by a seed set $S$. (A detailed definition of influence maximization is presented in the Appendix \ref{appdx:compare_to_sample_before}). Another application of SSM is the large-scale weighted sum submodular maximization problem where the objective can be expressed as $f(S)=\sum_{i=1}^Nw_if_i(S)$. Here $N$ is very large and $\sum_{i=1}^Nw_i=1$. Examples of this problem include large-scale facility location optimization. In this problem, the cost of accurately evaluating a problem would be high, but we can estimate $f(S)$ by sampling the index $I\in[N]$ with probability $w_i$ and then $f(S)=\mE_I[f_I(S)]$.
%\end{enumerate}


% If $t$ independent samples have been taken from $\mathcal{D}(X,u)$, then we denote the $i$-th sample for any $i\in\{1,...,t\}$ as $\Delta f_i(X,u)$, and the sample average of the $t$ samples, i.e. $\frac{1}{t}\sum_{i=1}^t\Delta f_i(X,u)$, is denoted as $\widehat{\Delta f_t}(X,u)$.[TODO:  verify if we still need this definition.]
Next, we present the definition of fixed $\epsilon$-approximation and multi-linear extension.


\textbf{Fixed $\epsilon$-approximation. }Given any random variable $X$, an estimate $\widehat{X}$ is a $\textit{fixed $\epsilon$-approximation}$ of $X$ if $\mE X-\epsilon\leq \widehat{X} \leq \mE X+\epsilon$. Notice that for any $X$ that is $R$-sub-Gaussian, we can take $O\left(\frac{R^2}{\epsilon^2}\log \frac{1}{\delta}\right)$ samples and the sample average is a fixed $\epsilon$-approximation of $X$ with probability at least $1-\delta$ by an application of Hoeffding's Inequality (Lemma \ref{hoeffding} in the appendix in the supplementary material).


\textbf{Multi-linear extension}. For any submodular objective $f$, the multi-linear extension of $f$ is defined as $\vect{F}$, i.e., $\vect{F}(\vect{x})=\sum_{S\subseteq U}\prod_{i\in S}x_i\prod_{j\notin S}(1-x_j)f(S)$ where $\vect{x}\in[0,1]^n$. Here we define $S(\vect{x})$ to be a random set that contains each element $i\in U$ with probability $x_i$, then by definition, we have that $\vect{F}(\vect{x})=\mE [f(S(\vect{x}))]$.






% We make the noisy observation assumption that the evaluation of any marginal gain with respect to $f$ is noisy and stochastic. Let us denote the observation of $\Delta f(S,s)$ for the $t$-th time as $ f_t(S,s)$. Throughout the paper, we assume the following properties hold for any subset $S$, element $s$, and any time $t$, $t'$:
%\begin{enumerate}
%    \item $\mathbb{E} f_t(S,s)=\Delta f(S,s)$ 
%    \item  $ f_t(S,s)$ is bounded in a range of $R$ (or equivalenetly $\frac{R}{2}$-sub-Gaussian).
%    \item The noise are independent in time. More specifically, $ f_t(S,s)$ and
%    $ f_{t'}(S,s)$ are independent $\forall t\neq t'$.
%\end{enumerate}
%The noisy access model is defined as follows
%\paragraph{Noisy access model. } Let us denote the marginal gain of adding element $s$ to a set $S$ as $\Delta f(S,s)$, i.e., $\Delta f(S,s):=f(S\cup\{s\})- f(S)$.
%\subsection{Notations}
%We provide some additional notations that are used throughout the paper. (i) $\deltafe$ denotes the mean estimate of the marginal gain $\Delta f(S,s)$ at time $t$. i.e., $\deltafe=\frac{1}{t}\sum_{i=1}^t  f_{i}(S,s)$; 

