%\documentclass{uai2023} % for initial submission
\documentclass[accepted]{uai2023} % after acceptance, for a revised
                                    % version; also before submission to
                                    % see how the non-anonymous paper
                                    % would look like
%% There is a class option to choose the math font
% \documentclass[mathfont=ptmx]{uai2023} % ptmx math instead of Computer
                                         % Modern (has noticable issues)
% \documentclass[mathfont=newtx]{uai2023} % newtx fonts (improves upon
                                          % ptmx; less tested, no support)
% NOTE: Only keep *one* line above as appropriate, as it will be replaced
%       automatically for papers to be published. Do not make any other
%       change above this note for an accepted version.

%% Choose your variant of English; be consistent
\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

%% Provided macros
% \smaller: Because the class footnote size is essentially LaTeX's \small,
%           redefining \footnotesize, we provide the original \footnotesize
%           using this macro.
%           (Use only sparingly, e.g., in drawings, as it is quite small.)


%======================================================================
%packeges and commands input by us
\usepackage{hyperref}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{mathtools}
\usepackage{amssymb}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{mathrsfs}
\usepackage{multirow} 
\usepackage[capitalize,noabbrev]{cleveref}
\DeclarePairedDelimiter{\ceil}{\lceil}{\rceil}
\DeclarePairedDelimiter\floor{\lfloor}{\rfloor}
\DeclareMathOperator*{\argmax}{arg\,max} 
\DeclareMathOperator*{\argmin}{arg\,min} 
\newcommand{\cmin}{c_\mathrm{min}}

\usepackage{amsthm}

\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\crefname{lemma}{lemma}{lemmas}
\newtheorem{corollary}[theorem]{Corollary}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{assumption}[theorem]{Assumption}
% \theoremstyle{remark}

\newtheorem{remark}[theorem]{Remark}
\newtheorem{fact}[theorem]{Fact}
% \newcounter{prob}
\newtheorem{problem}{Problem}

\usepackage{natbib}

\usepackage{xcolor}
\usepackage{soul}
\newcommand{\cjq}[1]{{\color{blue} #1 \color{black}}}
\newcommand{\cjqremove}[1]{{\color{gray} #1 \color{black}}}
\newcommand{\ngy}[1]{{\color{cyan}#1\color{black}}}
\newcommand{\zyh}[1]{{\color{brown}#1\color{black}}}

\usepackage{comment}
\usepackage{graphicx}
\usepackage{caption}
\usepackage{subcaption}

%======================================================================



%% Self-defined macros
\newcommand{\swap}[3][-]{#3#1#2} % just an example

\title{Size-Constrained k-Submodular Maximization in Near-Linear Time}

% The standard author block has changed for UAI 2023 to provide
% more space for long author lists and allow for complex affiliations
%
% All author information is authomatically removed by the class for the
% anonymous submission version of your paper, so you can already add your
% information below.
%
% Add authors
\author[1]{\href{mailto:<nieg@iastate.edu>?Subject=Your k-submodular UAI 2023 paper}{Guanyu Nie}{}}
\author[1]{\href{mailto:<yanhui@iastate.edu>?Subject=Your k-submodular UAI 2023 paper}{Yanhui Zhu}{}}
\author[1]{\href{mailto:<yididiya@iastate.edu>?Subject=Your k-submodular UAI 2023 paper}{Yididiya Y. Nadew}{}}
\author[1]{\href{mailto:<sbasu@iastate.edu>?Subject=Your k-submodular UAI 2023 paper}{Samik Basu}{}}
\author[1]{\href{mailto:<pavan@iastate.edu>?Subject=Your k-submodular UAI 2023 paper}{A. Pavan}{}}
\author[1]{\href{mailto:<cjquinn@iastate.edu>?Subject=Your k-submodular UAI 2023 paper}{Christopher John Quinn}{}}
% Add affiliations after the authors
\affil[1]{%
    Computer Science Deptartment\\
    Iowa State University\\
    Ames, IA, USA
}

  
  \begin{document}
\maketitle

\begin{abstract}
  We investigate the problems of maximizing k-submodular functions over total size constraints and over individual size constraints. k-submodularity is a generalization of submodularity beyond just picking items of a ground set, instead associating one of k types to chosen items.  For sensor selection problems, for instance, this enables modeling of which type  of sensor to put at a location, not simply whether to put a sensor or not.   We propose and analyze threshold-greedy algorithms for both types of constraints.  We prove that our proposed algorithms achieve the best known approximation ratios for both constraint types, up to a user-chosen parameter that balances computational complexity and the approximation ratio, while only using a number of function evaluations that depends linearly (up to poly-logarithmic terms) on the number of elements n, the number of types k, and the inverse of the user chosen parameter. Other algorithms that achieve the best-known deterministic approximation ratios require a number of function evaluations that depend linearly on the budget B, while our methods do not.  We empirically demonstrate our algorithms' performance in applications of sensor placement with k types and influence maximization with k topics.
\end{abstract}

\section{Introduction} \label{sec:intro}


There are a number of problems that can be abstracted as selecting a subset of items with a limit on the number of items, and for which redundancy between items can lead to diminishing returns in terms of utility.  Consider the problem of monitoring a traffic network using a limited number of sensors. We want to place sensors in the most informative locations.  Putting additional sensors in close proximity to each other would be redundant and result in little additional information gain.  Likewise, consider the problem of selecting a subset of influencers on social media to seed an advertising campaign.  Sponsoring additional influencers will improve the spread, but if the additional influencers have the same followers, the improvement in the spread may be limited.  Both of these problems, which involve selecting a subset of items, and for which redundancy can lead to diminishing returns, can be modeled as submodular maximization problems \citep{Krause2007NearoptimalOS,kempe2003maximizing}.



A set function $f:2^V\rightarrow \mathbb{R}$ over a set $V$ of $n$ elements is said to be submodular  if, for any  $S\subseteq T\subset V$ and $e\in V\setminus T$, it satisfies the following diminishing returns property, 
\begin{align}
    f(S\cup \{e\}) -f(S) \geq f(T\cup \{e\})-f(T). \nonumber
\end{align}  This inequality means that the marginal gain of adding $e$ to a set is non-increasing as the set gets larger. For many problems, the function $f$ is assumed to be monotone non-decreasing: $f(T)\geq f(S)$ for any $S\subseteq T\subseteq V$. %It is well-known that while 
While maximizing a monotone submodular function without constraints is trivial (the optimal solution is the whole set $V$), the problem of maximizing a monotone submodular function with  just a cardinality constraint of $B$ is NP-hard even to approximate with a ratio above $(1-1/e)\approx0.632$ \citep{Nemhauser1978AnAO}.  Surprisingly, a simple greedy algorithm can achieve the ratio of $(1-1/e)$ using $\mathcal{O}(nB)$ function evaluations \citep{Nemhauser1978AnAO}. While submodular functions  have been used in a number of applications, some problems cannot be modeled well by just selecting a single set. We give two examples to illustrate this.

\textbf{Influence maximization with $k$ topics:} Influence maximization involves identifying a small subset, or seed set, within a network that can achieve the greatest possible spread of information. This selection problem is frequently modeled as a submodular maximization problem in social networks, as noted by \citet{kempe2003maximizing}. However, if the information being spread includes multiple topics with varying effects on the network, the problem becomes more complex. Specifically, due to budget constraints, we must limit our seed set to a specific number of individuals, with each person being assigned a specific topic. In such cases, the standard submodular maximization approach may not be sufficient and could lead to a loss of important information.

\textbf{Sensor placement with $k$ types:} Consider the case where we want to monitor a traffic network through sensors. With a limited budget, we only want to place sensors in the most informative locations. This problem can be modeled as  submodular maximization  \citep{Krause2007NearoptimalOS}. However, the standard submodular maximization model fails to account for scenarios where we have multiple sensor types (e.g., temperature, humidity, illuminance) that need to be installed at each location, with only one sensor per location.

To account for variations in sensor types or sponsored messages, a richer class of functions is needed beyond submodular functions. In particular, the class of \textit{$k$-submodular} functions \citep{cohen2006complexity,kolmogorov2011submodularity,huber2012towards} can be used for these problems. Instead of simply including an item $e\in V$, each selected item is assigned one of $k$ types. Marginal gains can depend on the pair $(e,i)\in V\times \{1,\dots,k\}$. %
%
The special case of $k=2$,  \textit{bisubmodular} functions, has been widely studied
% There has been extensive research on the special case of $k=2$, which is called \textit{bisubmodular} functions. %
For example, \cite{singh2012bisubmodular} conducted sensor placement experiments with bisubmodular function models. %
%
For general $k$, \cite{ward2014maximizing} mentioned the applications of the $k$-submodular function on sensor placement and feature selection. \cite{ohsaka2015monotone} proposed algorithms for $k$-submodular optimization and applied them to sensor placement and influence maximization problems.

In this paper, we focus on size (cardinality) constraints. Specifically, we consider \textit{total size} (TS) constraints, where there is a limit on the total number of items from the ground set selected regardless of type, and \textit{individual size} (IS) constraints, where each of the $k$ types has its own limit.  Neither problem is a special case of the other.     For a total size constraint, $1/2$ is the best known approximation ratio. For individual size constraints, the best known approximation ratio is $1/3$.  In both cases, there is room for improvement in terms of the run time. A comparison table of works related to our methods can be found in \cref{tab:related-work}.   In \cref{sec:rw} we discuss related works in more detail.  

In this work we propose algorithms for maximizing $k$-submodular functions under TS and IS constraints, achieving the best deterministic approximation ratios (up to a user specified $\varepsilon>0$) while removing linear dependence on the constraint budgets in terms of value oracle complexity.  

\subsection{Our Contributions}
The contribution is threefold.  First,  we propose a threshold greedy algorithm for $k$-submodular maximization under a total size constraint, achieving a $(1/2-\varepsilon)$-approximation guarantee using only $\mathcal{O}(kn\varepsilon^{-1}\log (B\varepsilon^{-1}))$ function evaluations, for any user chosen $\varepsilon>0$.  $kn$ evaluations are the minimum needed to try each item-type pair once.  This is the first algorithm that   achieves a deterministic, near-optimal approximation ratio under total size constraints without linear dependence on the budget $B$ in terms of function evaluations.  Since $B$ can be as large as $n$, this is  significant. 

Second, we propose a threshold greedy algorithm for $k$-submodular maximization under an individual size constraint, achieving a $(1/3-\varepsilon)$-approximation guarantee using only $\mathcal{O}(kn\varepsilon^{-1}\log (B\varepsilon^{-1}))$ function evaluations, for any user chosen $\varepsilon>0$. This is the first algorithm that   achieves nearly the best known deterministic approximation ratio under individual size constraints without linear dependence on the budget $B$ in terms of function evaluations.  It also removes quadratic dependence on the number of types $k$ in value oracle complexity compared to a stochastic greedy method.  Third, we test our method using real-world data. 

\subsection{Related Works} \label{sec:rw}
We next review related works on maximizing monotone $k$-submodular functions, grouping works by constraint types. 


\paragraph{Unconstrained:} 
 Unlike the $k=1$ case, %for which it is trivial, 
 maximizing a monotone $k$-submodular function even without constraints is challenging. \citet{Iwata2015ImprovedAA} showed that even achieving an approximation ratio $\alpha\in (\frac{k+1}{2k}, 1]$ is NP-hard. \citet{Ward2014MaximizingKF} achieved $\max\{1/3, 1/(1 + a)\}$ approximation guarantee using $\mathcal{O}(kn)$ number of function evaluations for the unconstrained case, where $a=\max\{1,\sqrt{(k-1)/4}\}$.  \citet{Iwata2015ImprovedAA} improved the guarantee to $\frac{k}{2k-1}$ using the same number of oracle calls. 
 
 \paragraph{Size Constraints:} For size constraints, two types of constraints have been considered in the literature, namely total size (TS) constraints, where the number of items selected shares a common budget, and individual size (IS) constraints, where each of the $k$ types has a budget. \citet{ohsaka2015monotone} analyzed the greedy algorithm and obtained $1/2$ and $1/3$ approximation guarantees for total size (TS) constraints and individual size (IS) constraints, respectively. They also proposed stochastic versions of their greedy algorithms to reduce the number of function evaluations, inspired by the $k=1$ stochastic greedy algorithm proposed in \cite{mirzasoleiman2015lazier}. Those algorithms obtain the same approximation guarantee with a user-specified probability, but reduce time complexity from $\mathcal{O}(knB)$ to $\mathcal{O}(k(n-B)\log B\log \frac{B}{\delta})$ and $\mathcal{O}(k^2n\log \frac{B}{k}\log \frac{B}{\delta})$ for total size and individual size, respectively, with $\delta$ denoting the user-specified failure probability bound. 
 
 \cite{qian2017constrained} proposed a multiobjective evolutionary type algorithm for total size constraint, and showed that their algorithm can find a $1/2$-approximation solution using $\mathcal{O}(knB \log ^2 B)$ oracle calls. \cite{matsuoka2021maximization} utilized curvature for $k$-submodular functions, weak $k$-submodularity, and approximate k-submodularity to analyze how curvature  improves the approximation ratios of the standard  greedy and residual random greedy algorithms. %
 The aforementioned works are for the offline setting, which we also consider.  For the streaming setting, \cite{ene2022streaming} proposed an algorithm that achieves $\frac{1}{2(1+B_{\min}(2^{1/B_{\min}}-1))} \in(0.3,0.25)$ approximation using only $\mathcal{O}(nk)$ number of queries, where $B_{\min} =\min_{i\in [k]}{B_i}$.  
 
 \paragraph{Other Constraints:}  Some recent works have considered knapsack constraints. If the cost of an item is the same across all types, \cite{Tang2021OnMaximizing} proposed an algorithm inspired by \cite{khuller1999budgeted,Sviridenko2004ANO} that achieves $\frac{1}{2}(1-\frac{1}{e})$ using $\mathcal{O}(k^4n^5)$ number of function evaluations. \cite{Chen2022Monotone} proposed a partial-enumeration  algorithm inspired by \cite{khuller1999budgeted} that achieves $\frac{1}{4}(1-\frac{1}{e})$ with time complexity $\mathcal{O}(kn^2)$. \cite{pham2021streaming} proposed an algorithm for the streaming setting that achieves a $\frac{1}{4}-\varepsilon$ approximation guarantee and if the cost is over more general item-type pairs, their algorithm achieves $\min \{\frac{\alpha}{2}, \frac{(1-\alpha)k}{(1+\beta)k-\beta}\} - \varepsilon$ using  $\mathcal{O}(\frac{kn}{\varepsilon}\log B)$ queries where $\beta=\max_{i\neq j} \frac{c(e,i)}{c(e,j)}$, and $\alpha\in(0,1],\varepsilon\in (0,1)$ are input parameters. For matroid constraints, \citet{sakaue2017maximizing} proposed an algorithm that achieves a $1/2$-approximation and \citet{matsuoka2021maximization} proposed an algorithm that achieves a $\frac{1}{1+c}$-approximation where $c$ is the curvature. % $c\in[0,1]$ is the curvature. %the curvature of the function. 


\begin{table*}[t]
\centering
\caption{Table of selected related works.  For unconstrained maximization, which both Problems~\ref{problem:TS} and \ref{problem:IS} generalize, it is known that it is NP-hard to approximate better than $\frac{k+1}{2k}>\frac{1}{2}$ \citep{Iwata2015ImprovedAA}.      $^*$For individual size constraints, $B\gets \sum_{i=1}^k B_i$. $^\dagger$This result is for the streaming case, with $B_{\min} =\min_{i\in [k]}{B_i}$. The approximation guarantee is at least 1/4 and achieves its best (0.2953) when $B_{\min}$ tends to infinity. $^\ddagger$Curvature $c\in[0,1]$ with $c=0$ for linear functions.}
\begin{tabular}{c|c|c|c}
\hline
\textbf{Reference} & \textbf{Constraint}                & \textbf{Approximation} & \textbf{Time} \\ \hline
\cite{ohsaka2015monotone}  & \multirow{4}{*}{total size}  & 1/2  & $\mathcal{O}(knB)$    \\
\cite{ohsaka2015monotone} &  & 1/2 with prob. $\geq 1-\delta$      & $\mathcal{O}(kn\log B\log \frac{B}{\delta})$  \\ 
\cite{qian2017constrained} &   & 1/2  & $\mathcal{O}(knB \log ^2 B)$  \\ 
This paper        &    &  $1/2-\varepsilon$ &  $\mathcal{O}(kn\varepsilon^{-1}\log (B\varepsilon^{-1}))$  \\ \hline
\cite{ohsaka2015monotone} & \multirow{5}{*}{individual size$^*$}  & 1/3  & $\mathcal{O}(knB)$  \\
\cite{ohsaka2015monotone}  &    & 1/3 with prob. $\geq 1-\delta$   &   $\mathcal{O}(k^2n\log \frac{B}{k}\log \frac{B}{\delta})$   \\
\cite{ene2022streaming}  &    & $\frac{1}{2(1+B_{\min}(2^{1/B_{\min}}-1))}$$^\dagger$   &   $\mathcal{O}(kn)$   \\
\cite{matsuoka2021maximization}  &    & $\frac{1}{1+2c}$ where $c$ is curvature $^\ddagger$   &   $\mathcal{O}(knB)$  \\
This paper         &    &    $1/3-\varepsilon$   &  $\mathcal{O}(kn\varepsilon^{-1}\log (B\varepsilon^{-1}))$  \\ \hline
\end{tabular}
\label{tab:related-work}
\end{table*}


\section{Problem Statement} %\label{sec:prob-statement}


We begin with background materials and notation.  We then state the  two problems we consider.

For an positive integer $i$ let $[i]:=\{1,\dots,i\}$ denote the set of integers up to and including $i$.  For a set $S$, let $|S|$ denote its cardinality. Let $V$ denote a set of \textit{items} (such as locations where a sensor could be placed).  Let $n:= |V|$ denote the number of items.  Let $[k]:=\{1,\dots,k\}$ denote the set of possible \textit{types} (such as available types of sensors).   There are equivalent ways to express solutions. The notation $2^V$ for the power-set of $V$ in the $k=1$ setting can be generalized to  the set $(k+1)^V$ of length-$|V|$  $(k+1)$-ary tuples to denote type assignments for items, with a $0$ indicating no assignment (i.e. no sensor placed in that location).  For clarity,  we will mostly denote solutions by item-type pairs.  Let $\mathcal{S}$ denote the set of subsets of item-type pairs corresponding to elements of $(k+1)^V$, %
%
\begin{align*}
\mathcal{S}:= \{ \bigcup_{ \substack{j\in [n]:\\A(j)\neq 0} } ( V(j), A(j) )\ | \ A\in (k+1)^V  \}.
\end{align*}

Equivalently, $V$ can be partitioned by type,
% Equivalently, $V$ can be partitioned according to the type assignments, with 
\begin{align*}    
\mathcal{X}:= \{ 
 (\!\!\mathop{\cup}_{ \substack{j\in [n]: \\ A(j)=1} }\!\!\! \{V(j)\},\ \ldots, \mathop{\cup}_{ \substack{V(j)\in [n]:\\ A(j)=k} }\!\!\! \{V(j)\}) \ |\ A\in (k+1)^V \}
\end{align*} denoting the sets of partitions of elements by type (among elements with an assignment).  
For any $S \in \mathcal{S}$, for each type $i\in[k]$, we define $U_i(S):=\{a \in V \mid \text{s.t. } (a, i) \in S\}$ to be the set of items assigned type $i$.  We also define $ U(S) := \bigcup_{i\in[k]} U_i(S)$ to denote the set of elements with some type-assignment.



We call a function $f:\mathcal{S} \to \mathbb{R}$ \textit{monotone (non-decreasing)} if for any  sets $S, S' \in \mathcal{S}$ over item-type pairs satisfying $S\subseteq S'$, $f(S) \leq f(S')$. We call a monotone function $f:\mathcal{S} \to \mathbb{R}$ \textit{k-submodular} if for any sets $S, S' \in \mathcal{S}$ satisfying $S\subseteq S'$, and any item-type pair $(e,i)$ with $e \not \in U(S')$ (i.e. the item has no assigned type), $f$ satisfies a diminishing returns property,
\begin{align*}
    f(S \cup \{ (e,i) \}) - f(S) \geq  f(S' \cup \{ (e,i) \}) - f(S').
\end{align*}  We refer to such differences as marginal gains, representing them using conditioning notation $f( (e,i) |S):= f(S \cup \{ (e,i) \}) - f(S)$.

First, we introduce a lemma presented in \cite{Tang2021OnMaximizing} that will be used in proofs.

\begin{lemma}(\cite{Tang2021OnMaximizing}) \label{lem:1}
    For any $S, S^{\prime} \in \mathcal{S}$ with $S \subseteq S^{\prime}$, we have
    $$
    f\left(S^{\prime}\right)-f(S) \leq \sum_{(e, i) \in S^{\prime} \backslash S} f((e,i)|S) .
    $$
\end{lemma}
\begin{remark}
We use set notation (with sets over item-type pairs) to simplify the presentation.  We note that $f$ and subsequently marginal gains are only defined over $\mathcal{S}$, for which there are no item-type pairs with the same item.  For non-monotone functions, $k$-submodular functions are those with the above diminishing returns property (referred to as orthant submodularity) and an additional pairwise-monotonicity condition \citep{ward2014maximizing}.
\end{remark}

In the following, let $f$ be an arbitrary non-negative, monotone, $k$-submodular function. We further assume that $f(\emptyset)=0$, which is without loss of generality because otherwise, we can redefine $f(S):=f(S)-f(\emptyset)$ for all $S \in \mathcal{S}$.


We next state the two problems we consider.

\begin{problem}\label{problem:TS} For a monotone $k$-submodular function $f$ %$:(k+1)^V \to \mathbb{R}_{\geq 0}$ 
and total size constraint $B$, solve
\begin{align*}
    \argmax_{S\in \mathcal{S}:\ |S|\leq B} f(S).
\end{align*}
\end{problem}

\begin{problem}\label{problem:IS} For a monotone $k$-submodular function $f$ %$:(k+1)^V \to \mathbb{R}_{\geq 0}$ 
and individual size constraints $\{B_i\}_{i=1}^k$, solve
\begin{align*}
    \argmax_{ \substack{S\in \mathcal{S}: \\ |U_i(S)|\leq B_i\ \forall i\in [k] }} f(S).
\end{align*}
\end{problem}

Neither Problem~\ref{problem:TS} nor Problem~\ref{problem:IS} are special cases of the other, but both generalize unconstrained maximization, for which it was shown that it is NP-hard to even approximate with a ratio larger than $\frac{k+1}{2k}>\frac{1}{2}$
\citep{Iwata2015ImprovedAA}.  


\section{Threshold Greedy - Total Size}

\begin{algorithm}[t]
\caption{$k$-submodular Threshold Greedy-TS}
\label{alg:k-sub-ts}
\begin{algorithmic}
    \State {\bfseries Input:} access to a value oracle for a monotone $k$-submodular function $f: (k + 1)^V \rightarrow \mathbb{R}^+$, an integer budget $B\in \mathbb{Z}^+ $ and a tolerance parameter $\varepsilon>0$.
    \State {\bfseries Output:} a set $S$ of item-index pairs  with $|U(S)| \leq B$.
    \State Initialize $S\leftarrow \emptyset$, $\tau \leftarrow d= \max_{e\in V, i\in [k]} f(\{(e,i)\})$ 
    \While{$\tau>\frac{(1-\varepsilon)\varepsilon d}{2B}$} 
        \For{$e,i \in  V\setminus U(S), [k]$} 
            \If{$|U(S)|< B$ and $f((e,i)|S) \geq \tau$}
                \State $S\gets S\cup \{(e,i)\}$.
            \EndIf
        \EndFor
        \State Update $\tau \leftarrow (1-\varepsilon)\tau$.
    \EndWhile
    \State {\bfseries Return} $S$
\end{algorithmic}
\end{algorithm}

In this section, we present our first algorithm designed for Problem~\ref{problem:TS}, maximizing a $k$-submodular function under a total size constraint $B$. The pseudo-code is presented in \cref{alg:k-sub-ts}. The algorithm design is inspired by the threshold greedy algorithm for $k=1$ submodular maximization proposed by \citet{Badanidiyuru2014FastAF}. \cref{alg:k-sub-ts} uses a decreasing sequence of thresholds, starting from %$d= \max_{e\in V, i\in [k]}\Delta_{e,i}f(\emptyset)$
$d:= \max_{e\in V, i\in [k]} f(\{(e,i)\})$, which is the largest value among any item-type pair. For each threshold $\tau$ considered, the algorithm iterates over  all item-type pairs that are still feasible, in an arbitrary order.  A feasible item-type pair $(e,i)$ is added to the current solution $S$ if its  marginal gain with respect to $S$ is above the current threshold, $f((e,i)|S) \geq \tau$. %The ordering of the \textbf{for} loop over item-type pairs is arbitrary. 
 % For implementation, we will use lazy evaluation \cite{minoux2005accelerated}.  %, adds them to the selected set $S$ if the size constraint is not satisfied. 
After going over all the item-type pairs, the algorithm lowers the threshold and repeats. The algorithm will terminate when the selected subset uses up the budget $|S|=B$ or the lower bound for the  threshold is reached $\frac{\varepsilon d}{2B}$. 

\begin{remark} \label{remark:lazy:TS}
 For implementation, we will use lazy evaluation \cite{minoux2005accelerated}.  If the output set $S$ does not use up the size budget $B$, we can pad the output $S$ with extra feasible elements to use up the budget, such as based on their previously marginal gains with respect to earlier conditioning sets (if using lazy evaluation) or a randomly chosen set.  Provided this step does not involve value queries, by monotonicity the following guarantees will still hold.
\end{remark}

We next state our main results for Problem~\ref{problem:TS}.

\begin{theorem} \label{thm:main1}
    % The output of 
    \cref{alg:k-sub-ts} achieves a $(1/2-\varepsilon)$-approximation for the problem of maximizing a monotone $k$-submodular function under a total size constraint using at most $\mathcal{O}(nk\varepsilon^{-1}\log (B\varepsilon^{-1}))$  function evaluations.
\end{theorem}




\begin{proof}

\textbf{Run-time:}  The run-time is dominated by function evaluations.  The \textbf{for} loop takes $\mathcal{O}(nk)$ time. The %number of times the 
outer \textbf{while} loop is called $t'$ times, where $t'$ is the smallest integer $t'$ such that $(1-\varepsilon)^{t'} d \leq \frac{(1-\varepsilon)\varepsilon d}{2B}$.  Let $t$ denote the value where equality holds, so $t' = \lceil t \rceil$.  Rearranging, $t$ satisfies    
\begin{align}
    t &= 1 - \frac{\log (2B\varepsilon^{-1})}{\log (1-\varepsilon)} %\nonumber\\
        %
    % &
    \leq 1+ \frac{\log (2B\varepsilon^{-1})}{\varepsilon}, \nonumber
\end{align}
where the last inequality follows from $\log (1-x)<-x$ for $x<1$. Then, $t'$ can be upper bounded by $\mathcal{O}(\varepsilon^{-1}\log (B\varepsilon^{-1}))$ by observing that $t' \leq t+1$.
Thus, with $\mathcal{O}(\varepsilon^{-1}\log (B\varepsilon^{-1}))$ calls of the outer \textbf{while} loop, the total run time is  $\mathcal{O}(nk\varepsilon^{-1}\log (B\varepsilon^{-1}))$.

\textbf{Approximation guarantee:} %
%
For proving the approximation guarantee, we divide the problem into two cases. The first case is when we used up the budget, i.e., we have selected $B$ items upon the execution of the algorithm is finished. Similar to the construction in \citep{ohsaka2015monotone}, we consider swapping one element at a time from the output of our algorithm, $S$, with one item in the optimal set $\mathrm{OPT}$. Since $|\mathrm{OPT}| = B$, we need at most $B$ steps to construct $\mathrm{OPT}$ starting from $S$. Then, we show that each step of the swapping result in a small gain in the function value, and thus the total advantage of the $\mathrm{OPT}$ over $S$ is also not large. The second case is when we have selected less than $B$ items. The result from the first case also holds if the algorithm selected $B$ items ignoring the size constraint. Then, since the gain of the items exceeding the budget is small (less than the minimum threshold by the selection rule), the total gain of those items is also small. This will indicate that the value of the selected set is also not far from the value of $\mathrm{OPT}$. Let $S^\circ$ denote the solution output by \cref{alg:k-sub-ts}.


\textbf{Case 1:} When the final selected $S^\circ$ satisfies $|U(S^\circ)|=B$. 

Let $\left(e_j, i_j\right) \in V \times[k]$ denote the $j$-th pair chosen by the algorithm, and let $S_j$ denote the set $S$ after $\left(e_j, i_j\right)$ was added. We define $S_0=\emptyset$. Let $\mathrm{OPT}$ be the optimal solution. In the following, we will compare the marginal gains of item-type pairs in $S^\circ$ to the marginal gains for item-type pairs in $\mathrm{OPT}$.  We will construct a sequence of subsets of item-type pairs, combining pairs from $S^\circ$ and $\mathrm{OPT}$ in order to show inequalities resulting in the stated approximation bound.  Some care will be needed for item-type pairs $(e,i)\in S^\circ$ and $(e,i') \in \mathrm{OPT}$ where an item $e$ is included in both $S^\circ$ and  $\mathrm{OPT}$ but with different types.

We begin by indexing the item-type pairs in the output $S^\circ = \{ (e_1,i_1), \dots, (e_B,i_B)\} $ in the order they were added to form $S^\circ$.  We next let $S_j$ denote $S$ after $j$ elements were added, so $S_j := \{ (e_1,i_1), \dots, (e_j,i_j)\}$ for $j\in[B]$ and we set $S_0:=\emptyset$ as the initial empty set.  Thus by construction %
% \begin{align}
%     f(S_{j+1}) - f(S_j) = f((e_{j+1},i_{j+1})|S_j) .\nonumber
% \end{align}
$f(S_{j+1}) - f(S_j) = f((e_{j+1},i_{j+1})|S_j).$


We also index the item-type pairs in the optimal solution $\mathrm{OPT} = \{ (e_1',i_1'), \dots, (e_B',i_B')\} $ using the same indices in $\mathrm{OPT}$ as we have in $S^\circ$ for pairs containing the same items.  That is, if the item $e_j$ in the $j$th selected pair $(e_j,i_j)$ %in $S^\circ$ 
in \cref{alg:k-sub-ts} is also in a pair $(e_j,i')$ in $\mathrm{OPT}$, the latter pair should have the same index (even though the type $i_j$ and $i'$ assigned to that item in $S^\circ$ and $\mathrm{OPT}$ respectively may be different).  For other pairs in  $\mathrm{OPT}$, the indexing is arbitrary.   With this alignment of indices of pairs in $S^\circ$ and $\mathrm{OPT}$ that share a common item, we construct a sequence of cardinality $B$ sets $O_0$, $O_1$, \dots, $O_B$, by swapping elements of $\mathrm{OPT}$ with elements of $S^\circ$ in increasing order of the indexing (i.e. beginning with the first pair selected by \cref{alg:k-sub-ts}),
%
\begin{align*}
    %
    O_0 :=& \{ (e_1', i_1'), (e_2', i_2'), \dots, (e_{B-1}',i_{B-1}'), (e_B',i_B')   \} \nonumber \\
    %
        %
    O_{1} :=& \{ (e_1, i_1), (e_2', i_2'), \dots, (e_{B-1}',i_{B-1}'), (e_B',i_B')   \} \nonumber\\
 %S^\circ =
    %
        %
    \vdots& \nonumber\\
    %
    O_{B-1} :=& \{ (e_1, i_1), (e_2, i_2), \dots, (e_{B-1},i_{B-1}), (e_B',i_B')   \} \nonumber\\
    %
    O_B := & \{ (e_1, i_1), (e_2, i_2), \dots, (e_{B-1},i_{B-1}), (e_B,i_B)   \}. \nonumber
    %
%\mathrm{OPT} =
\end{align*}
By construction, for $j\in\{0,\dots,B\}$ we have $S_j \subseteq O_j$. Furthermore, for $j+1 \in [B]$, we have $S_j \subseteq O_j \cap O_{j+1}$.  By this construction, we also have that at the time when \cref{alg:k-sub-ts} selected its $(j+1)$st pair $(e_{j+1},i_{j+1})$, the aligned pair in $\mathrm{OPT}$, $(e_{j+1}',i_{j+1}')$ was also feasible.  This entails that both the item was still available,
\begin{align}
    e_{j+1}' \not \in U(S_j) \quad \forall \ j+1\in[B], \label{eq:feas:TS}
\end{align} and that the budget had not yet been consumed.  Trivially the budget $B$ was never violated.  \eqref{eq:feas:TS} follows  by construction  since either the items in the $(j+1)$st pairs in $S^\circ$ and $\mathrm{OPT}$ match, $e_{j+1}' = e_{j+1}$, for which reason we would have aligned them in the first place, or, if they are not the same, then that item was never chosen by \cref{alg:k-sub-ts}, so $e_{j+1}' \not \in U(S^\circ)$, for which the ordering of $\mathrm{OPT}$ is arbitrary but importantly means $e_{j+1}'$ was always available.  

We consider the difference $f(O_j) - f(O_{j+1})$.  This is not a marginal gain since neither set contains the other.  However, since the sets differ in exactly one index (the $(j+1)$st) by construction, we can bound the difference.  %
%
\begin{align}
    &f(O_j) - f(O_{j+1}) \nonumber\\
    &= \big( f(O_j\cap O_{j+1}) + f( (e_{j+1}',i_{j+1}') | O_j\cap O_{j+1} ) \big) \nonumber\\
    %
    &\qquad - \big( f(O_j\cap O_{j+1}) + f( (e_{j+1},i_{j+1}) | O_j\cap O_{j+1} ) \big)  \nonumber\\
    %
    %
    % &=f( (e_{j+1}',i_{j+1}') | O_j\cap O_{j+1} ) - f( (e_{j+1},i_{j+1}) | O_j\cap O_{j+1} ) \nonumber\\
    %
    %
    &\leq f( (e_{j+1}',i_{j+1}') | O_j\cap O_{j+1} ) \label{eq:thm1:ub1} \\
    %
    %
    &\leq f( (e_{j+1}',i_{j+1}') | S_j ), \label{eq:thm1:ub2}
\end{align}
where \eqref{eq:thm1:ub1} follows from monotonicity and \eqref{eq:thm1:ub2} follows from orthant submodularity with $S_j \subseteq O_j\cap O_{j+1}$.
We cannot directly relate the marginal gain $f( (e_{j+1}',i_{j+1}') | S_j )$ to that  achieved by the $(j+1)$st pair added to $S^\circ$ in \cref{alg:k-sub-ts}, $f( (e_{j+1},i_{j+1}) | S_j )$,  since the ordering of the \textbf{for} loop is arbitrary.  We will be able to bound the ratio of those two marginal gains based on how the threshold $\tau$ is shrunk.  We will first consider the case the threshold was the initial threshold and then consider the alternative case.

\paragraph{Sub-case $\tau_{j+1}=d$:} Suppose the threshold $\tau_{j+1}$ when the $(j+1)$st pair was added in \cref{alg:k-sub-ts} was the initial threshold, $\tau_{j+1}=d,$ by construction the largest of all marginal gains.  Then \begin{align}
    f( (e_{j+1},i_{j+1}) | S_j )%&=\tau_{j+1} \nonumber\\
    =\tau_{j+1}
    &=d %\nonumber\\
    % &
    \geq f( (e_{j+1}',i_{j+1}') | S_j ). \nonumber
\end{align}  

\paragraph{Sub-case $\tau_{j+1}<d$:}
Suppose the threshold $\tau_{j+1}<d,$ equivalently \cref{alg:k-sub-ts} is in the second or later execution of the  outer \textbf{while} loop. The pair $(e_{j+1}',i_{j+1}')$ was considered in the previous \textbf{while} loop execution but not added.  (Recall the element $e_{j+1}'$ is either $e_{j+1}$ or an element never chosen in $S^\circ$, and thus any pair containing $e_{j+1}'$ was still feasible in the previous \textbf{while} loops.)  Since the pair $(e_{j+1}',i_{j+1}')$ was not added, its marginal gains with respect to the %current 
greedy set  in the previous \textbf{while} loops, one of $\{S_j, S_{j-1}, \dots,\emptyset\}$, must have been below the previous threshold $\tau_{j+1}(1-\varepsilon)^{-1}$.  By orthant submodularity, marginal gains are non-increasing in the conditioning set, so \begin{align}
    f( (e_{j+1}',i_{j+1}') | S_j ) %
    %
    &\leq (1-\varepsilon)^{-1} \tau_{j+1} \label{eq:thm1:unchosen}\\
    %
    &\leq 
    (1-\varepsilon)^{-1} f( (e_{j+1},i_{j+1}) | S_j ),\label{eq:thm1:chosen}
\end{align} 
where \eqref{eq:thm1:unchosen} holds because $(e_{j+1}',i_{j+1}')$ is not chosen in a previous round and \eqref{eq:thm1:chosen} holds because $(e_{j+1},i_{j+1})$ is chosen in this round.
Thus, whether $(e_{j+1},i_{j+1})$ was added during the first execution of the \textbf{while} loop or later, we have %
%
\begin{align}
    f(O_j) - f(O_{j+1}) 
    &\leq f( (e_{j+1}',i_{j+1}') | S_j ) \nonumber\\
    %
    &\leq 
    (1-\varepsilon)^{-1} f( (e_{j+1},i_{j+1}) | S_j ) \nonumber\\
    &= (1-\varepsilon)^{-1} \big( f(S_{j+1}) - f(S_j)\big). \label{eq:prf:TS:OtoSbnd}
\end{align}
Using a telescoping sum with $O_0=\mathrm{OPT}$ and $O_B = S^\circ$,
\begin{align}
    f(\mathrm{OPT})-f(S^\circ)&=\sum_{j=0}^{B-1}(f(O_j)-f(O_{j+1})\nonumber\\
    %
    &\leq \sum_{j=0}^{B-1} \frac{1}{1-\varepsilon}(f(S_{j+1})-f(S_j)) \nonumber\\ %\tag{by \eqref{eq:prf:TS:OtoSbnd}}\\
    %
    % &=\frac{1}{1-\varepsilon}(f(S^\circ)-f(\emptyset)) \nonumber\\
    %
    &= \frac{1}{1-\varepsilon}f(S^\circ), \nonumber
\end{align}
using \eqref{eq:prf:TS:OtoSbnd} and $f(\emptyset)=0$, which for $\varepsilon<1$ implies 
\begin{align}
    f(S^\circ) %
    %
    &\geq \frac{1-\varepsilon}{2-\varepsilon}f(\mathrm{OPT}) \nonumber\\
    %
    &\geq (\frac{1}{2}-\varepsilon)f(\mathrm{OPT}). \label{eq:prf:TS:case1:Sbnd}
\end{align}

\textbf{Case 2:} When the final selected $S^\circ$ satisfies $|U(S^\circ)|<B$. 
Let $\ell=|U(S^\circ)|<B$ denote the number of elements added.  Let $\tilde{S}$ denote a set of cardinality $B$ that \cref{alg:k-sub-ts} would have selected if \cref{alg:k-sub-ts} terminated only when either (a) $B$ pairs had been selected or (b) the marginal gains on all remaining elements evaluated as zero. Without loss of generality, we only consider (a), as we could trivially identify that event (b) was occurring and add an arbitrary feasible subset of item-type pairs so that $|U(S^\circ)|=B$ without any reduction in value, and the following inequalities will still hold (any item-type pairs in $\mathrm{OPT}$ that were feasible once all marginal gains reduced to zero, but not chosen to pad $S^\circ$, have equal marginal gains to those that were). % (b) would imply $f(\tilde{S}) = f(\mathrm{OPT})$ and subsequently the same bounds as we will show for (a).  
Thus, by construction $S^\circ \subset \tilde{S}$ and $\tilde{S}$ has $B-\ell$ extra elements.

First, since $\tilde{S}$ has $B$ elements selected according to decreasing thresholds, the result \eqref{eq:prf:TS:case1:Sbnd} from \textbf{Case 1} holds for $\tilde{S}$, 
% that %$f(\tilde{S}) \geq  (\frac{1}{2}-\varepsilon)f(\mathrm{OPT})$ 
% for $\varepsilon<1$, %
%
\begin{align}
    f(\tilde{S}) &\geq  \frac{1-\varepsilon}{2-\varepsilon}f(\mathrm{OPT}). \label{eq:prf:TS:case2:augS}
\end{align}

Second, since $S^\circ$ only accumulated $\ell$ elements before the terminal threshold bound of $\frac{(1-\varepsilon)\varepsilon d}{2B}$ was reached, then the marginal gains of the remaining $B-\ell$ elements in $\tilde{S}$ can be bounded, with the largest possible value of the  threshold $\tau$ in the last execution of the \textbf{while} loop  being $(1-\varepsilon)^{-1}  \frac{(1-\varepsilon) \varepsilon d}{2B} = \frac{\varepsilon d}{2B}$, using Lemma~\ref{lem:1},
\begin{align}
    f(\tilde{S}) - f(S^\circ) %
    %
    &\leq \sum_{(e,i) \in \tilde{S} \backslash S^\circ} f((e,i)|S^\circ) %\tag{using Lemma~\ref{lem:1}}
    \nonumber\\
    %
    % &\leq \sum_{(e,i) \in \tilde{S} \backslash S^\circ} \frac{\varepsilon d}{2B} \nonumber\\
    %
    &\leq (B-\ell) \frac{\varepsilon d}{2B} \nonumber\\
    %
    \Longleftrightarrow \qquad %
    %
    f(S^\circ) &\geq f(\tilde{S}) - (B-\ell) \frac{\varepsilon d}{2B}. \label{eq:prf:TS:case2:augtoS}
\end{align}
We note that  since by construction $S^\circ\subset\tilde{S}$, each of the item-index pairs in $\tilde{S} \backslash S^\circ$ must have contained items not in  $U(S^\circ)$, so the marginal gains in the formulas above are well-defined.

With \eqref{eq:prf:TS:case2:augS} and \eqref{eq:prf:TS:case2:augtoS}, monotonicity of $f$, and $d\leq f(\mathrm{OPT})$, 

\begin{align}
    f(S^\circ) %
    %
    &\geq f(\tilde{S}) - (B-\ell) \frac{\varepsilon d}{2B} \tag{by \eqref{eq:prf:TS:case2:augtoS}} \\
    %
    %
    &\geq \frac{1-\varepsilon}{2-\varepsilon}f(\mathrm{OPT}) - (B-\ell) \frac{\varepsilon d}{2B} 
     \tag{by \eqref{eq:prf:TS:case2:augS}} \\
    %
    % &\geq \frac{1-\varepsilon}{2-\varepsilon}f(\mathrm{OPT}) -  \frac{\varepsilon d}{2} 
     % \nonumber\\
        %
    % &\geq \frac{1-\varepsilon}{2-\varepsilon}f(\mathrm{OPT}) -  \frac{\varepsilon f(\mathrm{OPT})}{2} \label{eq:prf:TS:case2:combinedbnd:4}\\
     %
     &\geq (\frac{1}{2}-\varepsilon)f(\mathrm{OPT}). \nonumber%\label{eq:prf:TS:case2:combinedbnd}
\end{align} 
\end{proof}


\cref{alg:k-sub-ts} and the proof of \cref{thm:main1} generalizes the threshold greedy algorithm for submodular ($k=1$) functions proposed in \citep{Badanidiyuru2014FastAF}. The reason why a threshold algorithm can improve the greedy algorithm on time complexity is that, while the greedy algorithm considers adding one element during one pass of all the remaining elements, the threshold algorithm adds multiple elements during one pass. By utilizing similar techniques for choosing threshold sequences, we obtain the same reduction  in worst-case function evaluations with the same additive reduction in the (worst-case) approximation ratio. More specifically, \cite{Badanidiyuru2014FastAF}  achieved a $(1-1/e-\varepsilon)$-approximation guarantee for the $k=1$ case, where $1-1/e$ is the best possible when $k=1$; we achieved $(1/2-\varepsilon)$-approximation, where $1/2$ is (asymptotically) the best possible for general $k$ \citep{Iwata2015ImprovedAA}. As for the time complexity, \citet{ohsaka2015monotone}  improved from order $B$ to $\log(B)$ by considering a random set with size of order $\frac{\log(B)}{B}$ when adding each element. They showed that the aforementioned set has overlapping items with items in an optimal set that are still available with high probability. With  a threshold strategy, we improved the run time from order $B$ to $\log(B)$ by considering a sequence of $\mathcal{O}(\log(B))$ thresholds. %"


\section{Thresh. Greedy - Individual Size}

\begin{algorithm}[t]
\caption{$k$-submodular Threshold Greedy-IS}
\label{alg:k-sub-is}
\begin{algorithmic}
    \State {\bfseries Input:} access to a value oracle for a monotone $k$-submodular function $f: (k + 1)^V \rightarrow \mathbb{R}^+$, integers $B_1,\cdots, B_k\in \mathbb{Z}^+ $ and a tolerance parameter $\varepsilon$.
    \State {\bfseries Output:} an item-index pair set $S$ with $|U_i(S)| \leq B_i$ for each $i\in [k]$.
    \State Initialize $S\leftarrow \emptyset$, $B\leftarrow \sum_{i\in [k]}B_i$, $\tau \leftarrow d= \max_{e\in V, i\in [k]}f(\{(e,i)\})$ 
    \While{$\tau>\frac{(1-\varepsilon)\varepsilon d}{3B}$} 
        \State $I\leftarrow \{i\in [k]:|U_i(S)| < B_i\}$
        \For{$e,i \in  V\setminus U(S), I$} 
            \If{$f((e,i)|S) \geq \tau$}
                \State $S\gets S\cup \{(e,i)\}$.
            \EndIf
        \EndFor
        \State Update $\tau \leftarrow (1-\varepsilon)\tau$.
    \EndWhile
    \State {\bfseries Return} $S$
\end{algorithmic}
\end{algorithm}

In this section, we present our second algorithm, a threshold greedy algorithm for Problem~\ref{problem:IS},  maximizing a $k$-submodular function under individual size constraints. The pseudo-code is presented in \cref{alg:k-sub-is}. Similar to \cref{alg:k-sub-ts}, the algorithm considers a decreasing sequence of thresholds, starting from $d= \max_{e\in V, i\in [k]} f(\{(e,i)\})$. For each threshold $\tau$ considered, the algorithm searches through all the item-type 
pairs that satisfy the following conditions: the item has not been chosen regardless of type, and the number of items with type $i$ has not exceeded $B_i$. While searching, the algorithm includes the item-type pairs whose marginal gains are above the current threshold. After going over all such item-type pairs, the algorithm decreases the threshold and repeats the search. The algorithm will terminate when the selected subset already satisfies $|U_i(S)|=B_i$ for all $i\in [k]$ or the considered threshold is below $\frac{\varepsilon d}{3B}$, where $B = \sum_{i\in[k]} B_i$.  We have the following result.
\begin{theorem} \label{thm:main2}
    \cref{alg:k-sub-is} achieves a $(1/3-\varepsilon)$-approximation for the problem of maximizing a monotone $k$-submodular function under individual size constraints using at most $\mathcal{O}(nk\varepsilon^{-1}\log (B\varepsilon^{-1}))$  function evaluations.
\end{theorem}

In \citep{ohsaka2015monotone}, to design a stochastic version of the greedy algorithm in the individual size constraint case, unlike the total size constraint case, we need a set that overlaps with the available items in an optimal set with high probability \textit{for each type}. This induces an additional $k$ term compared with the total size constraint. For a threshold strategy, there is no such concern since we consider all available elements during each iteration of the \textbf{while} loop. It is still feasible to use a sequence of thresholds with size $\mathcal{O}(\log(B))$ to get the same approximation guarantee. 


In the interest of space, we defer the proof of \cref{thm:main2} to Appendix A. The general structure of the proof is similar to the proof of \cref{thm:main1}, but extra care is taken in constructing the sequence of intermediate solutions $\{O_1,\dots,O_{B-1}\}$. In the proof of \cref{thm:main1} for a total size constraint, when constructing $O_{j+1}$ from $O_j$, we can swap types without limitation as long as the total budget is not exceeded, as there is no constraint on a specific type. With individual size constraints, however, multiple select swaps are needed going from $O_j$ to $O_{j+1}$ to maintain feasibility.  

This is the first algorithm to achieve a deterministic, nearly 1/3-approximation without linear dependence on the budgets. The stochastic greedy algorithm proposed in \citep{ohsaka2015monotone}  achieves 1/3 with (at least) a user-specified probability, and the run time bound is slower than ours by a factor of $k$. In \citep{ene2022streaming}, a streaming algorithm with only $\mathcal{O}(kn)$ queries is proposed, but the approximation ratio is worse. 


\section{Experiments}
In this section, we empirically evaluate the performance of our proposed methods with applications of sensor placement with $k$ types and influence maximization with $k$ topics. We compare our results with baselines in terms of both the objective value achieved and oracle complexity. The code of our experiments can be found on \url{https://github.com/yididiyan/k_submodular/}.

\paragraph{Baselines: }  We compare our algorithms against the greedy and stochastic greedy algorithms proposed in \citep{ohsaka2015monotone}. For all implementations, we use lazy evaluation. For our proposed threshold algorithms, we consider $\varepsilon=0.1, 0.2, 0.5$ and $0.8$. We note that for our threshold algorithms with approximation ratios (in the worst-case) of $\frac{1}{2}-\varepsilon$ and $\frac{1}{3}-\varepsilon$ for total size constraints and individual size constraints respectively, the worst-case bound becomes vacuous, but the algorithm could potentially work well in practice, as was shown in experiments in \citep{li2020submodular} for a threshold algorithm for $k=1$ submodular maximization with a knapsack constraint. For stochastic greedy algorithms, we use $\delta=0.1, 0.2, 0.5$ and $0.8$ ($\delta$ bounds the failure probability of achieving the stated approximation ratio) for fair comparisons with threshold greedy algorithms, although in the original paper \citep{ohsaka2015monotone}, only $\delta=0.1$ was considered. We do not consider other baselines mentioned in \cref{tab:related-work} as there are differences in setups. We refer to Appendix B for a more detailed discussion.

\paragraph{Metrics: } We evaluate the performance of our methods and baselines according to two criteria: the objective value and the number of function queries. We first explore how these depend on the constraint parameters, namely the total budget $B$ for total-size constraints and the type-specific budgets $\{B_i\}$ for individual size. Then we demonstrate the main advantage of the proposed threshold greedy algorithm over the stochastic greedy algorithm under individual size constraints, namely the improvement by a factor of $k$ in the theoretical guarantee.


\begin{figure}[t]
    \centering
    \begin{subfigure}[b]{0.48\linewidth}
         \centering
         \includegraphics[width=\textwidth]{figs/entropy-TS.png}   
         \caption{Entropy comparison for TS constraints.}
         \label{fig:TS:entropy}
     \end{subfigure}\hfill
     \begin{subfigure}[b]{0.48\linewidth}
         \centering
         \includegraphics[width=\textwidth]{figs/entropy-IS.png}         
         \caption{Entropy comparison for IS constraints.}
         \label{fig:IS:entropy}
     \end{subfigure}
     \begin{subfigure}[b]{0.48\linewidth}
         \centering
         \includegraphics[width=\textwidth]{figs/function-eval-TS.png}         
         \caption{Comparison of function queries for TS constraints.}
         \label{fig:TS:eval}
     \end{subfigure}\hfill
     \begin{subfigure}[b]{0.48\linewidth}
         \centering
         \includegraphics[width=\textwidth]{figs/function-eval-IS.png}         
         \caption{Comparison of function queries for IS constraints.}
         \label{fig:IS:eval}
     \end{subfigure}
    \caption{Sensor placement over $k$ types.}
    \label{fig:sensor}
\end{figure}

\subsection{Sensor Placement} %\subsection{Sensor Placement with Budget varied -- Total Size and Individual Size Constraints}
\paragraph{Sensor Placement with $k$ types:}
In this section, we apply our algorithms for maximizing $k$-submodular functions with  the individual size constraint to the sensor placement problem with $k$ kinds of sensors. To formally define the problem, we need several notations from information theory. Let $\Omega=$ $\left\{X_1, X_2, \ldots, X_n\right\}$ be a set of discrete random variables. The entropy of a subset $S$ of $\Omega$ is defined as $H(S)=-\sum_{s \in \mathrm{dom} S} \Pr[s] \log \Pr[s]$. The conditional entropy of $\Omega$ having observed $S$ is $H(\Omega \mid S):=H(\Omega)-H(S)$. In the sensor placement problem, we want to set the sensors so as to maximize the reduction of expected entropy, which is equivalent to finding a set $S$ that maximizes the entropy.

Now we formalize the sensor placement problem. There are $k$ kinds of sensors for different measures. For total size constraints, we want to allocate $B$ sensors to set $V$ of $n$ locations. For individual size constraints, we want to allocate $B_i$ many sensors of the $i$-th kind for each $i \in[k]$ to set $V$ of $n$ locations. In both settings, each location can be instrumented with exactly one sensor. Let $X_e^i$ be the random variable representing the observation collected from a sensor of the $i$-th kind if it is installed at the $e$-th location, and let $\Omega=\left\{X_e^i\right\}_{e \in V, i \in[k]}$. Then, the problem is to select $S \in(k+1)^V$ that maximizes $f(S)=H\left(\bigcup_{e \in U(S)}\left\{X_e^{S(e)}\right\}\right)$ subject to $|U(S)| \leq B$ for total size constraints,  $|U_i(S)| \leq B_i$ for each $i \in[k]$ for individual size constraints. %It is shown in
\citet{ohsaka2015monotone} showed that $f$ is monotone $k$-submodular.

\textbf{Experiment settings:} We use the publicly available Intel Lab dataset preprocessed by \cite{ohsaka2015monotone}. This dataset contains approximately $2.3$ million readings collected from 54 sensors deployed in the Intel Berkeley research lab between February 28th and April 5th, 2004. Temperature, humidity, and light values are extracted and discretized into bins of 2 degrees Celsius each, 5 points each, and 100 luxes each, respectively. Hence there are $k=3$ kinds of sensors to be allocated to $n=54$ locations. For total size constraints, we set the value of $B$ to 3, 6, 9, $\dots$, 54. For individual size constraints, we denote budgets for sensors measuring temperature, humidity, and light by $B_1, B_2$, and $B_3$ respectively. We set $B_1=B_2=B_3=b$, where $b$ is a parameter varying from 1 to 18.

\textbf{Results:} The results are shown in \cref{fig:sensor}. For total size constraints, \cref{fig:TS:entropy} shows that in terms of function values, all algorithms tested using different hyperparameter values had similar performances. In terms of the number of function evaluations (with lazy evaluations) as shown in \cref{fig:TS:eval}, for the stochastic greedy algorithm, the number of function evaluations stays almost identical regardless of the $\delta$ parameter. One possible reason is that the $\delta$ parameter only appears inside the logarithm term of the size of the stochastic set, so varying it would not affect the size significantly. For our threshold greedy algorithm, however, increasing $\varepsilon$ significantly reduces the number of function evaluations (since the number of thresholds considered is of order $\mathcal{O}(1/\varepsilon)$) without significant degradation in solution quality. We note there is a drop in the number of function evaluations by stochastic greedy algorithms as the budget $B$ approaches the maximum value $n$ (the number of locations). This phenomenon was reported in %the original paper of \citep{ohsaka2015monotone})
\citep{ohsaka2015monotone} and may be due in part to the formula of the stochastic batch size ($B$ appears in the denominator).

For individual size constraints, \cref{fig:IS:entropy} show that for function values, all the algorithms tested with varying hyperparameters have similar performances. For the number of function evaluations shown in \cref{fig:IS:eval}, we can see there is a drop as $\delta$ get larger for the stochastic greedy algorithm, but the drop is not that significant compared with that of the threshold greedy algorithm when $\varepsilon$ is varied. Again, this is due to the fact that the $\delta$ parameter only appears inside the logarithm term of the size of the stochastic set (so affects the runtime logarithmically), while the $\varepsilon$ parameter affects the runtime of the threshold algorithm linearly. When both $\delta$ and $\varepsilon$ are set to 0.8, it becomes apparent that the threshold algorithm outperforms the stochastic greedy algorithm in terms of the number of function evaluations. Specifically, at the largest margin, the threshold algorithm requires only one-third of the number of function evaluations compared to the stochastic greedy algorithm.

Overall, we observed that increasing the parameter $\delta$ for (baseline) stochastic greedy did not significantly impact the function values or the number of function evaluations. However, increasing $\varepsilon$ in our proposed threshold algorithms resulted in a significant reduction in the number of function evaluations with only a negligible decrease in the function values. For these experiments, our threshold algorithms enable significantly better tradeoffs in balancing accuracy and runtime compared to the baseline methods.

% \subsection{Influence Maximization Experiments with k varied -- Individual Size Constraints}
\subsection{Influence Maximization}

\paragraph{Influence Maximization with $k$ Topics:}
In this problem, a social network is presented as a directed graph $G=(V, E)$ where $V$ is a set of nodes and $E$ is the set of edges. Each edge $(u, v) \in E$ is associated with weights $\{p_{u, v}^i\}_{i \in[k]}$, where $p_{u, v}^i$ represents the strength of influence from user $u$ to $v$ on topic $i$. The objective is to maximize the number of users in the network who eventually get influenced by one of the topics. For information diffusion, we use the \textit{k-topic independent cascade} ($k$-IC) model presented in \citet{ohsaka2015monotone}, which generalizes the independent cascade model \citep{kempe2003maximizing}. 

More specifically, the influence spread $\sigma:(k+1)^V \rightarrow \mathbb{R}_{+}$ in the $k$-IC model is defined as $\sigma(S)=\mathbb{E}\left[\left|\bigcup_{i \in[k]} A_i\left(U_i(S)\right)\right|\right]$, where $A_i\left(U_i(S)\right)$ is a random variable representing the set of influenced nodes in the diffusion process of the $i$-th topic.  It is shown in \citep{ohsaka2015monotone} that the influence spread function $\sigma$ is monotone $k$-submodular. Given a directed graph $G=(V, E)$, edge probabilities $\{p_{u, v}^i \mid ((u, v) \in E, i \in[k])\}$, and a budget $B$ for total size constraint (or $B_i$ for $i\in [k]$ for individual size constraint), the problem is to select a seed set $S \in (k+1)^V$ that maximizes $\sigma(S)$ subject to $|U(S)| \leq B$ (or $|U_i(S)| \leq B_i$ for each $i\in[k]$). 

\begin{figure}[t]
    \centering
    \begin{subfigure}[b]{0.452\linewidth}
         \centering
         \includegraphics[width=\textwidth]{figs/influence-varyk.png}         
         \caption{Influence spreads.}
         \label{fig:varyk:influence}
     \end{subfigure}
     \hfill
     \begin{subfigure}[b]{0.492\linewidth}
         \centering
         \includegraphics[width=\textwidth]{figs/eval-varyk.png}         
         \caption{Function evaluations.}
         \label{fig:varyk:eval}
     \end{subfigure}
    \caption{Influence maximization over $k$ topics under individual size constraints.}
    \label{fig:IM:IS}
\end{figure}

\textbf{Experiment settings:}  We use the preprocessed data of Digg network from \citep{ohsaka2015monotone}, where we have 3,523 users, 90,244 links, and $k=10$ topics. Recall that for individual size constraints, the number of function evaluations for the stochastic greedy algorithm is $\tilde{\mathcal{O}}(k^2 n)$, and $\tilde{\mathcal{O}}(k n)$ for the threshold greedy algorithm. We set individual size $b=2$ and compare both the function values and the number of function evaluations when $k$ is varied. Similar to \citep{ohsaka2015monotone}, during the process of the algorithms, the influence spread was approximated by simulating the diffusion process 100 times. When the algorithms terminate, we simulated the diffusion process 10,000 times to obtain sufficiently accurate estimates of the spread. % influence spread.

\textbf{Results:} For this set of experiments, the results are shown in \cref{fig:IM:IS}. Regarding influence spread, we can observe from \cref{fig:varyk:influence} that all methods perform comparably, except for some variability in larger values of $k$ due to the randomness of diffusions. However, when considering the number of function evaluations, it becomes apparent in \cref{fig:varyk:eval} that the stochastic greedy algorithm significantly underperforms compared to all other methods. For the threshold greedy algorithms, there is some (though not significant) performance advantage when the $\varepsilon$ is set to 0.5.

Interestingly, the greedy algorithm outperforms the stochastic greedy algorithm in terms of the number of function evaluations by a large margin. This is due to the fact that the greedy algorithm allows for a more efficient implementation using lazy evaluations. This phenomenon is also reported in \citep{ene2022streaming}. From this, we can infer that our threshold greedy algorithms also allow for the implementation of lazy evaluations in a similarly efficient manner.

\section{Conclusion}
In this work, we proposed algorithms for the problem of maximizing monotone $k$-submodular functions under size constraints.  We showed that algorithms employing a threshold-greedy strategy improve the run-time among deterministic approximation algorithms with the best-known approximation ratios, and for independent size constraints could even improve on the run-time of a stochastic greedy algorithm.  There are a number of important future directions, including investigating  if similar strategies could improve run-time for more complicated constraints such as matroids, knapsacks, etc., application specific adaptations, and investigating what approximation ratios are achievable for these and related problems.

\begin{contributions}  
    G.~Nie and Y.~Zhu were the lead authors and contributed equally. 
\end{contributions}


\begin{acknowledgements} % will be removed in pdf for initial submission,
						 % (without ‘accepted’ option in \documentclass)
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
    % Pavan's work is partly supported by NSF award 2130536. 
    This material is based upon work supported by the National Science Foundation under Grants No.~2130536 and 2149617.
\end{acknowledgements}

% References
\bibliography{nie_646.bib}
\end{document}
