\section{Introduction}
% {\color{blue}TODO: Page numbers?}

% \textcolor{red}{Some ideas of things to cut out:   You could put the part of the related work where you give an overview of submodular optimization into the appendix (the first three paragraphs) and just put a very short one paragraph overview in their place.}

Submodularity is a property of set functions that arises in many applications such as cut functions in graphs \citep{balkanski2018non}, coverage functions \citep{bateni2017almost}, data summarization objectives \citep{tschiatschek2014learning}, information theoretic quantities such as mutual information \citep{iyer2021generalized}, and viral marketing in social networks \citep{kempe2003maximizing}. A function $f:2^U\to\mathbb{R}_{\geq 0}$ defined over subsets of the universe $U$ of size $n$ is submodular if for all $X\subseteq Y\subseteq U$ and $u\notin Y$, $f(Y\cup\{u\}) - f(Y)\leq f(X\cup \{u\})-f(X)$. In addition, in many applications of submodular functions $f$ is monotone \citep{tschiatschek2014learning,iyer2021generalized,kempe2003maximizing}, meaning that for all $X\subseteq Y\subseteq U$, $f(X)\leq f(Y)$.
% Approximation algorithms for submodular optimization problems have received a wealth of attention \cite{nemhauser1978analysis,badanidiyuru2014fast,mirzasoleiman2015lazier,balkanski2019exponential,buchbinder2015tight,calinescu2011maximizing}. 
Proposed algorithms for submodular optimization typically are assumed to have value oracle access to $f$. That is, $f$ is a black box that can be queried for any $X\subseteq U$, and the value of $f(X)$ is returned  \citep{nemhauser1978analysis,badanidiyuru2014fast,balkanski2019exponential,buchbinder2015tight}.

% \textcolor{red}{TODO: Quit talking about noisy samples to $f$ but instead say noisy queries to some function.}
% \textcolor{red}{TODO: }
However, in many optimization scenarios, we can only make noisy queries from some random distribution to estimate the objective. This setting arises in applications such as diversified recommendation systems \citep{yue2011linear, hiranandani2020cascading}, data summarization with human feedback \citep{singla2016noisy}, influence maximization \citep{kempe2003maximizing, wen2017online}, and feature selection tasks \citep{krause2005near}. {In addition to noisy evaluations that are inherent in applications, submodular optimization algorithms that leverage the continuous multilinear extension of a discrete submodular function are an additional example of an application where the objective is accessed via a noisy approximation \citep{calinescu2011maximizing,badanidiyuru2014fast}.}
%$F$ of the function $f$, where $F$ is accessible only through noisy i.i.d noisy random samples \cite{calinescu2011maximizing,badanidiyuru2014fast}. In this setting, the common approach to evaluate the multi-linear extension $F$ is by taking sufficiently many samples and applying concentration inequalities to estimate the objective function to a fixed-precision \cite{kempe2003maximizing,calinescu2011maximizing}, which is also referred to as fixed $\epsilon$-approximation in this paper (see Section \ref{sec:prelim}). 

In the above scenarios, querying the exact function value is unrealistic. 
 Instead, a better model of access to the objective function is in the form of bandit feedback, i.e. via noisy i.i.d. sub-Gaussian queries. This model of access has led to many recent works on submodular bandit \citep{singla2016noisy,takemori2020submodular,yue2011linear,nie2022explore}. 
%a more feasible assumption is that we can access the objective function through noisy, i.i.d. sub-Gaussian queries, which is also referred to as bandit feedback in the submodular bandit literature \cite{singla2016noisy,chen2017interactive}.
In addition, as modern datasets continue to grow in scale, 
a fundamental challenge in implementing these algorithms is the inefficiency of sample complexity, which significantly impacts practical feasibility. 
%a fundamental challenge in implementing these algorithms is the inefficiency of sample complexity, which significantly impacts practical feasibility. 

% However, modern
% massive datasets demand algorithms that are as efficient as possible in terms of runtime, and in the
% case of submodular optimization algorithms, the main computation time bottleneck for the above
% approach would be the noisy queries to $f$.
%Existing approaches to this setting tend to  take sufficiently many samples and applying concentration inequalities in order to achieve a fixed-precision\footnote{See Section \ref{sec:prelim} for a definition of fixed-precision approximation. } approximation of the objective function and then use an existing approximation algorithm for SM \cite{kempe2003maximizing,calinescu2011maximizing}.
%or utilizing the approaches from existing bandit literature \cite{singla2016noisy}.
%However, this method can be unnecessarily inefficient in terms of sample complexity and runtime, which makes them hard to operate on relatively large instances. This can also be seen from our experimental results.
%$\vect{F}$ as the objective also incorporate sampling of noisy queries \cite{calinescu2011maximizing,badanidiyuru2014fast}. This is because the multi-linear extension of $f$ is defined as $\vect{F}(\vect{x}) = \mathbb{E}_{S\sim\mathcal{D}_{\vect{x}}} f(S)$, where $S$ is a random set. Consequently, we can only obtain noisy access to $\vect{F}$ due to the inherent randomness of $S$. 
%\wenjing{Motivated by the above, our main objective in this paper is to propose approximation algorithms for submodular optimization with improved sample and runtime efficiency compared with existing approaches. It is also noteworthing that the problem of proposing sample-efficient algorithms in this setting is equivalent to the problem of best-arm identification in submodular bandit, where the objective is that to identify a super-arm (subset of the universe) with comparable approximation ratio in as few samples as possible \cite{audibert2010best}, \cite{singla2016noisy}.} The key insight is that an algorithm doesn't necessarily need to approximate $f$ with such fine precision at every query in order to find a solution with an approximation guarantee comparable to the exact value oracle setting. Instead, we propose methods of adaptively approximating the function $f$ based on decisions that the algorithm must make. 


Motivated by this, we study the problem of submodular maximization under bandit feedback with an emphasis on developing low sample complexity algorithms. In particular, we consider the pure exploration setting, also known as best-arm identification. The objective is to identify a super-arm (subset of the universe) that satisfies the PAC-bound in as few queries as possible. Specifically, a solution set $S$ satisfies the $(\delta, \epsilon)$-PAC bound if, with probability at least $1-\delta$, the objective value of $S$ satisfies $f(S)\geq f(S_0)-\epsilon$, where $S_0$ is the output solution set of an approximation algorithm for the submodular maximization problems with an exact value oracle. Therefore, we evaluate our algorithms using two metrics: the function value of the output solution and sample efficiency. 
% \cite{horel2016maximization,singla2016noisy,hassidim2017submodular,qian2017subset,crawford2019submodular,huang2022efficient}.
% In particular, we assume that the noisy sampling of $f$ is random and is i.i.d sub-Gaussian, . 
In particular, the contributions of the paper are as follows:
%\textcolor{red}{TODO: Now talk about the challenges in terms of runtime, combine with next paragraph.}
%In particular, we study the setting where we are not able to query the exact value of $f$, but instead, for any $S\subseteq U$ and $u\in U$, we are able to take noisy samples from a distribution $\mathcal{D}(S,u)$ whose expected value is the marginal gain $\Delta f(S,u) = f(S\cup\{u\})-f(S)$. 
% Data summarization with human feedback \cite{singla2016noisy}, influence maximization \cite{kempe2003maximizing}, feature selection tasks \cite{krause2005near}, and continuous algorithms using multi-linear extension are all examples of this noisy access setting (see Section \ref{sec:exp} for a more detailed description of applications). Instead of measuring the performance of the algorithms by the number of exact queries to $f$, our goal in this paper is to propose algorithms that achieve an approximation guarantee on the exact function value using as few noisy queries to $f$ as possible.
%\citeauthor{singla2016noisy} considered this setting and proposed combining the standard greedy algorithm with the best arm identification problem found in combinatorial bandit literature \cite{chen2014combinatorial}.
%Motivated by this, the contributions of our paper are as follows:
% \begin{itemize}
%     \item abc
% \end{itemize}
% \vspace{-2mm}
\begin{enumerate}[noitemsep]
\item[(i)] We propose the adaptive sampling algorithm \samplong (\samp) in Section \ref{sec:sampling}, which can be used to determine if the mean of a random variable $X$ is approximately above or below a given threshold $w$ with high probability in relatively few random samples.
%By considering the gap between $\mE X$ and threshold $w$, the algorithm achieves an improved instance-dependent sample complexity compared with fixed-precision approximation to $X$.
Intuitively, the required number of samples is inversely proportional to the gap between $\mE X$ and $w$, and therefore we can significantly decrease the number of samples relative to the fixed-precision approach (see Section \ref{sec:prelim}) by sampling less when the gap is large.
% \samp is related but significantly different from algorithms used for best-arm-identification in bandit, as we explain in detail in Section \ref{sec:monotone}.
% This sampling algorithm serves as a versatile tool that is used in many noisy submodular algorithms in the subsequent sections.
\samp is used as a subroutine for all proposed algorithms for submodular maximization problems in the paper, and as a result the proposed algorithms exhibit an improved sample complexity compared with fixed-precision approximation.
%\item Using \samp as a subroutine, we propose algorithms for various submodular maximization problems under the noisy setting with the results listed as follows. The proposed algorithms exhibit an improved sample complexity compared with fixed-precision approximation. 
% \textcolor{red}{TODO: I think you should actually put the below as items at the first level. It looks worse when they are (a), etc. and it doesn't actually save much space if any.}
%\begin{enumerate}[noitemsep]
\item[(ii)] We address the problem of Monotone Submodular Maximization with Cardinality constraint (MSMC) in Section \ref{sec:monotone}, which is defined to find the set $\arg\max\{f(X): X\subseteq U, |X|\leq \kappa\}$. 
   We prove two results for the proposed \alglong algorithm (\alg), Theorem \ref{mainthm} and Theorem \ref{thm:monotone2}. Theorem \ref{mainthm} is demonstrated to achieve an improved sample complexity compared with that of the related work of \cite{singla2016noisy}, while achieving the same approximation guarantee. The sample complexity in Theorem \ref{thm:monotone2} is better than Theorem \ref{mainthm} in terms of sub-Gaussian parameter $R$, which is important in applications like influence maximization.
   % \textcolor{red}{TODO: Are the two algorithms just with different input to use the different concentration inequalities? I think it would look better to just refer to it as one algorithm. Or at least in the contributions just talk about one.}
% \textcolor{red}{Talk about approximation guarantee, mention faster than other approaches.}
    % The input parameters to \alg give a tradeoff between the approximation guarantee and the number of noisy marginal gain samples. In the worst case, \alg takes $O(nR^2\log(\kappa/\alpha)\log(n\log(\kappa/\alpha)/\alpha)/(\epsilon^2\alpha))$ noisy samples, where the noisy samples are bounded in the range $[0, R]$. But we give an instance-dependent upper bound on the number of samples that can be much better.
\item[(iii)] In Section \ref{sec:matroid}, the algorithm \contialglong (\contialg) is proposed and analyzed for the problem of Monotone Submodular Maximization with Matroid constraint (MSMM). MSMM is to find the solution of $\arg\max_{S\subseteq\mathcal{M}}f(S)$, where $\mathcal{M}$ is a matroid defined on subsets of the ground set $U$. \contialg accesses the multilinear extension of $f$ via noisy samples, since the multilinear extension can be difficult to compute in general \citep{calinescu2011maximizing,badanidiyuru2014fast}. In particular, we demonstrate that \contialg has an improved sample complexity compared with the one proposed in \cite{badanidiyuru2014fast}. 
\item[(iv)] In Section \ref{sec:nonmono}, we propose \texttt{Confident Double Greedy} (\texttt{CDG}) for Unconstrained Submodular Maximization (USM). The goal is to find a subset $S\subseteq U$ that maximizes $f(S)$ where $f$ is not necessarily monotone. The theoretical guarantee on sample complexity is presented in Theorem \ref{thm:nonmono} in the appendix.
%\end{enumerate}
\item[(v)] We experimentally analyze \alg on instances of noisy data summarization and influence maximization. We compare \alg to several alternative methods including the algorithm of \citet{singla2016noisy} which is discussed in more detail in Section \ref{sec:relatedwork} and in the appendix.
    %\alg is compared to alternative approaches of taking an $\epsilon$-approximation at each evaluation, and two variants of the approach of \cite{singla2016noisy}.
    \alg is demonstrated to be a practical choice that can save many samples relative to alternative approaches.
\end{enumerate}

Finally, it is important to distinguish our approach in comparison to the standard technique of applying concentration inequalities to estimate the objective function to a fixed-precision, e.g. \cite{kempe2003maximizing,calinescu2011maximizing}, which we call a fixed $\epsilon$-approximation (see Section \ref{sec:prelim}). Fixed-precision estimation is often highly sample-inefficient, since an algorithm does not necessarily need to approximate $f$ with such fine precision at every query to find a high-quality solution. Instead, we propose methods of adaptively approximating the function $f$ based on decisions that the algorithm must make, with an emphasis on minimizing the total number of noisy queries. We further illustrate the significant technical challenges over the simple fixed $\epsilon$-approximation approach in Appendix Section \ref{appdx:compare_to_fixed_eps_approx}.

%In this setting, the common approach to evaluate the multi-linear extension $F$ is by taking sufficiently many samples and applying concentration inequalities to estimate the objective function to a fixed-precision \cite{kempe2003maximizing,calinescu2011maximizing}, which is also referred to as fixed $\epsilon$-approximation in this paper (see Section \ref{sec:prelim}).However, in many algorithms where the key step is the thresholding procedure to evaluate if the marginal gain is above or below a threshold, fixed-precision estimation is often highly sample-inefficient. Our main insight is that an algorithm doesn’t necessarily need to approximate $f$ with such fine precision at every query in order to find a solution with an approximationguarantee comparable to the exact value oracle setting. Instead, we propose methods of adaptivelyapproximating the function $f$ based on decisions that the algorithm must make, with an emphasis on minimizing the total number of noisy queries.}

%In this paper, we consider the PAC learning setting, which is widely considered in other machine learning fields such as the bandit and reinforcement learning literature \cite{even2002pac}.
%\begin{definition}{\textbf{(PAC learning in SMC.) }}
%Fix $\epsilon>0$ and $\delta\in(0,1)$, an algorithm for SMC is $(\epsilon,\delta)$-PAC with an $\alpha$-approximation guarantee if the returned solution set $S$ satisfies that $f(S)\geq \alpha f(OPT)-\epsilon$ with probability at least $1-\delta$.
%\end{definition}
% and $R^2$-sub-Gaussian with the mean that $\mE f_t(a,S)=\Delta f(a,S)$.