\section{Preliminaries}
\label{section:preliminaries}
In this section, we define the notations used in the paper. Following this, we
formulate the Slate Bandits problem and present the assumptions that enable us to prove the regret guarantee provided in Theorem \ref{theorem: Regret OFUL} and Theorem \ref{theorem:TS}.

\paragraph{Notations}The set $\cbrak{1,2\ldots,N}$ is denoted as $\sbrak{N}$. Unless otherwise specified, we use bold upper case letters for matrices, bold lower case letters for vectors, and upper case calligraphic symbols or greek letters for sets. For any matrix $\mathbf{A}$, we denote its minimum and maximum eigenvalues as $\lambda_{min}(\mathbf{A})$ and $\lambda_{max}(\mathbf{A})$ respectively. We write $\mathbf{A}\mgeq 0$, if matrix $\mathbf{A}$ is positive semi-definite and $\mathbf{A}\mgeq \mathbf{B}$, if $\mathbf{A}-\mathbf{B}\mgeq 0$. For a positive semi-definite matrix $\mathbf{A}$, we define the norm of a vector $\mathbf{x}$ with respect to $\mathbf{A}$ as $\matnorm{\mathbf{x}}{\mathbf{A}} = \sqrt{\mathbf{x}^\top{\mathbf{A}}\mathbf{x}}$ and the spectral norm of $\mathbf{A}$ as $\twonorm{\mathbf{A}} = \sqrt{\eigmax{\mathbf{A}^\top\mathbf{A}}}$. We use $\mathbf{I}_{m}$ and $\mathbf{0}_{m}$ to denote the $m\times m$ identity and zero matrices respectively. When the dimension $m$ is clear from the context, we use $\mathbf{I}$ and $\mathbf{0}$ instead.  The symbols $\P$ and $\E$ denote probability and expectation respectively. For sets $\mathcal{A}, \mathcal{X}$ that are subsets of some ambient space $\R^m$, we define the diameter of $\mathcal{X}$ as $diam(\mathcal{X}) = \max\limits_{\bm{x}_1 ,\bm{x}_2 \in \mathcal{X}} \lVert \bm{x}_1 - \bm{x}_2 \rVert$  and the diameter with respect to $\mathcal{A}$ as $diam_{\mathcal{A}}(\mathcal{X}) = \max\limits_{\bm{a} \in \mathcal{A}} \max\limits_{\bm{x}_1 , \bm{x}_2 \in \mathcal{X}} \lvert \bm{a}^\top (\bm{x}_1 - \bm{x}_2) \rvert$.

\subsection{Slate Bandits}
In the Slate Bandits problem, a learner interacts with the environment over $T$ rounds. At each round $t\in [T]$, the learner is presented with $N$ finite sets $\mathcal{X}^i_t$ $(\subset \R^{d}), i\in [N]$, of \emph{items} and is expected to select one item (say $\mathbf{x}^i_t$) from each $\mathcal{X}^i_t$. Based on the selected $N$-tuple $\mathbf{x}_t = (\mathbf{x}^1_t,\ldots \mathbf{x}^N_t)$ (called a ``slate'') the learner receives a stochastic binary reward $y_t(\mathbf{x}_t)$. The learner's goal is to select slates $\mathbf{x}_t, t\in [T]$ such that her expected regret, 
\[
Regret(T) = \sum\limits_{t=1}^T \bigg\{\max_{\mathbf{x}\in \mathcal{X}_t}\E[y_t(\mathbf{x})] - \E[y_t(\mathbf{x}_t)] \bigg\}
 \]
is minimized\footnote{We also use $R(T)$ for shorthand.}. Here, $\mathcal{X}_t$ denotes the set $\mathcal{X}^1_t\times\ldots\times \mathcal{X}^N_t$ of all possible slates at round $t$. When the chosen slate $\mathbf{x}_t$ is clear from the context, for simplicity, we will denote $y_t(\mathbf{x}_t)$ as $y_t$. For convenience, we say that the slate $\mathbf{x}_t$ comprises of $N$ ``slots'', and the item $\mathbf{x}_t^i$ is placed in slot $i$ in the slate. 


In this work, we consider two well known settings; $(a)$ \textbf{Stochastic Contextual} and $(b)$ \textbf{Non-Contextual} (also known as Fixed-Arm setting). In the first setting, we assume that at every round $t \in [T]$, the set $\mathcal{X}_t^i$ is constructed by sampling from a distribution (unknown to the learner) $\mathbb{D}_i$, in an $\mathrm{i.i.d}$ fashion. Moreover, $\mathcal{X}_t^i$ and $\mathcal{X}_s^j$ are sampled independently of one another, for all $s,t \in [T]$ and $i,j \in [N]$. In the second setting, we assume $\mathcal{X}_t^i$ remains fixed over time. Thus, in this setting, for simplicity, we denote $\mathcal{X}_t^i$ by $\mathcal{X}^i$.

\paragraph{Logistic rewards}In this paper, we assume that the binary reward variable $y_t$ comes from a Logistic Model. Therefore, $
\P[y_t=1 \mid \mathbf{x}_t] = \mu(\mathbf{x}_t^\top\bm\theta^\star)$,
where $\mu:\R\rightarrow\R$ is the logistic function, i.e., $\mu(a) = 1/(1+\exp(-a))$, and
$\mathbf{\theta}^\star\in \R^{dN}$ is an unknown $d\times N$ dimensional parameter vector. Similar to prior works on Logistic bandits \citep{Faury2020, Abeille2021, Faury2022}, we assume that $\twonorm{\theta^\star}\leq S$, where $S$ is known to the learner, and $\twonorm{\mathbf{x}^i}\leq 1/\sqrt{N}$, for all $\mathbf{x}^i\in \mathcal{X}_t^i$, $i\in [N], t\in [T]$\footnote{This implies the usual assumption $\twonorm{\mathbf{x}}\leq 1$ for all $\mathbf{x}\in \mathcal{X}_t$.}. Recent logistic bandit literature \citep{Filippi2010, Faury2020, Abeille2021, Faury2022} also identifies a critical parameter $\kappa$, that captures the non-linearity of the reward for the given problem instance, defined as follows.

\begin{equation}
    \kappa = \max_{t\in [T]} \max_{\mathbf{x} \in \X_t, \bm\theta\in\Theta} \frac{1}{\dot{\mu}(\mathbf{x}^\top \bm\theta)}
\end{equation}

where $\Theta =\{\twonorm{\theta}\leq S\} \subset \R^{dN}$. The parameter $\kappa$ can be intuitively seen as the mismatch between the true reward function and a linear approximation of the same. Developing algorithms with regret independent of $\kappa$ has gained significant attention recently \citep{Faury2020, Abeille2021, Faury2022, Sawarni2024} and is an active area of research. We refer the reader to Section 2 of \cite{Faury2020} for a thorough discussion on $\kappa$ and its implications on regret analysis. 

\begin{assumption}
    (\textbf{Diversity Assumption}) We describe a key assumption that enables us to design algorithms with low per-round computational complexity and strong regret guarantees (Theorem \ref{theorem: Regret OFUL} in Section \ref{section:main-algo} and Theorem \ref{theorem:TS} in Appendix \ref{appendix:ts-algos}). Let $\mathcal{F}_t$ be the sigma algebra generated by $\{\mathbf{x}_1, y_1, \ldots, \mathbf{x}_{t-1}, y_{t-1}\}$ and $\phi = \mathcal{F}_0\subset\mathcal{F}_1\subset \ldots \mathcal{F}_T$, be the associated filtration. For each $i\in [N]$, $t\in [T]$, we assume that,
\[
\E[\mathbf{x}_t^i\mid \mathcal{F}_t] = \mathbf{0} \hspace{1em}\text{and} \hspace{1em} \E[\mathbf{x}_t^i{\mathbf{x}_t^i}^\top \mid \mathcal{F}_{t}] \mgeq \rho \kappa \mathbf{I}
\]

where $\rho >0$ is a fixed constant and $\kappa$ is the non-linearity parameter defined earlier in Section \ref{section:preliminaries}.
\label{assumption: diversity}
\end{assumption}

\textbf{Remarks on Assumption \ref{assumption: diversity}: }
The assumption intuitively means that for each slot $i\in [N]$ and round $t\in [T]$, the item features $\mathbf{x}_t^i$ that can be selected by the algorithm are sufficiently ``diverse'', i.e., the expected matrix $\E[\mathbf{x}_t^i{\mathbf{x}_t^i}^\top \mid \mathcal{F}_{t}]$ is full rank and has sufficiently large eigenvalues. In our proofs, this assumption is used to first prove that with high probability the minimum eigenvalue of certain design matrices $\mathbf{W}_t^i = \gamma \mathbf{I} + \sum_{s\in [t]}\dot{\mu}(\mathbf{x}_s^\top\mathbf{\theta}_{s+1})\mathbf{x}_s^i{\mathbf{x}_s^i}^\top$ used by our algorithms (Algorithms \ref{algo:batch_OFUL}, \ref{algo:TS}, \ref{algo:TS-Fixed}) grows (sufficiently) linearly with $t$. In particular, we show that (Lemma \ref{lemma: min_eig_design}, Appendix \ref{appendix:general}) $
\lambda_{min}(\mathbf{W}_t^i) \geq \gamma + c\rho\kappa t$, for a fixed constant $c>0$. We critically utilize this linear growth of the minimum eigenvalue (Lemma \ref{lemma: ineq on W}, Appendix \ref{appendix: proof_regret_oful} and Lemma \ref{lemma:multiplicative-equivalence}, Appendix \ref{appendix: general_lemmas_ts}) to prove multiplicative equivalence between the block diagonal matrix $\mathbf{U}_t = diag(\mathbf{W}_t^1, \ldots, \mathbf{W}_t^N)$ and a similarly defined slate-level design matrix
$\mathbf{W}_t = \gamma \mathbf{I} + \sum_{s\in [t]}\dot{\mu}(\mathbf{x}_s^\top\mathbf{\theta}_{s+1})\mathbf{x}_s\mathbf{x}_s^\top$. As a result of this multiplicative equivalence, we are able to use slot level exploration bonuses\footnote{Instead of slate level exploration.} (leading to low per round time complexity in Algorithms \ref{algo:batch_OFUL}, \ref{algo:TS} and \ref{algo:TS-Fixed}), and still continue to have optimal regret. Details of the algorithm and the regret proof can be found in Sections \ref{section:main-algo}, \ref{section:TS} and Appendix \ref{appendix:ts-algos}. We would like to highlight that many similar diversity assumptions have been used in the literature and connections between them have also been studied (Section $3$ \cite{Papini2021}). Depending on the strength of the assumption, novel and stronger regret guarantees for well-known algorithms have been established, (e.g., Lemma $2$, \cite{Papini2021} and Corollary $4$, \cite{Das_2024}). Interestingly, their regret proofs also proceed by first showing a linear lower bound on the minimum eigenvalue of the design matrix. Since the assumption is instance/algorithm dependent, there could be instances where the linear lower bound might not hold. To study this, we empirically examine the growth of the minimum eigenvalues ($\lambda_{min}(\mathbf{W}_t^i)$) for a large number of randomly chosen instances and see a clear linear trend validating the assumption, at least for these randomly picked instances. More details can be found in Appendix \ref{appendix:empirical-validation}.

