\section{\slateglincb}
\label{section:main-algo}




\begin{algorithm}[ht] 
\caption{\texttt{Slate-GLM-OFU}}
\label{algo:batch_OFUL} 
    %\vspace{-3mm}
\begin{algorithmic}[1]
\STATE \textbf{Inputs:} $T, \delta, S$

\STATE Initialize $\mathbf{W}_1^1 = \ldots = \mathbf{W}_1^N = \mathbf{I}_d$, $\mathbf{W}_1 = I_{dN}$, $\Theta_1 = \{\twonorm{\mathbf{\mathbf{\theta}}}\leq S\}$, $\mathbf{\theta}_1 \in \Theta_1$, $\eta_t(\delta) = O(S^2Nd\log(t/\delta))$, and $ \H_1 = \emptyset$
        
\FOR{each round $t \in [T]$}
\STATE Obtain the set of items $\mathcal{X}^i_t$, $\forall i \in [N]$, and find
$\mathbf{x}^i_t = \argmax_{\mathbf{x} \in \X^i_t} \langle\mathbf{x}^\top\mathbf{\theta}_t^{i}\rangle + \sqrt{\eta_t(\delta)} \matnorm{\mathbf{x}}{(\mathbf{W}_t^i)^{-1}}$

\STATE Select slate $\mathbf{x}_t = (\mathbf{x}^1_t, \ldots, \mathbf{x}^N_t)$ and get reward $y_t$.

\STATE Obtain $\bm\theta_{t+1}$, $\{\mathbf{W}_{t+1}^i\}_{i=1}^N$, $\Theta_{t+1}$, $\mathcal{H}_{t+1}$ by calling Algorithm \ref{algo:adaptive-updates} with inputs $\mathbf{x}_t$, $y_t$, $\bm\theta_t$, $\mathbf{W}_t$, $\{\mathbf{W}_{t}^i\}_{i=1}^N$, $\Theta_t$, $\mathcal{H}_t$
\ENDFOR
\end{algorithmic}
\end{algorithm}

\begin{algorithm}[ht] 
\caption{\texttt{ada-OFU-ECOLog-Updates}}
\label{algo:adaptive-updates} 
\begin{algorithmic}[1]
\STATE \textbf{Inputs:} $\mathbf{x}_t, y_t, \bm\theta_t, \mathbf{W}_t, \{\mathbf{W}_{t}^i\}_{i=1}^N , \Theta_t, \mathcal{H}_t$
\STATE Initialize $\gamma_t(\delta) = O(S^2Nd\log(t/\delta))$ and $\beta_t(\delta) = O(S^6Nd\log(t/\delta))$.
\STATE Compute $\bar{\mathbf{\theta}}_t$, $\mathbf{\theta}^0_t$, and $\mathbf{\theta}^1_t$ using 
\ref{theta_bar} and \ref{theta_u}

\IF{$\dot{\mu}(\mathbf{x}_t^\top\bar{\mathbf{\theta}}_t) \leq 2\dot{\mu}(\mathbf{x}_t^\top\mathbf{\theta}^u_t)$ for $u \in \{0,1\}$}

\STATE Let $\mathbf{\theta}_{t+1}$ be solution of  \ref{equation:optimization} up to precision $1/t$.

\STATE $\mathbf{W}_{t+1}^i = \mathbf{W}_t^i + \dot{\mu}(\mathbf{x}_t^\top\mathbf{\theta}_{t+1})\mathbf{x}_t^i{\mathbf{x}_t^i}^\top$, $\forall i\in [N]$

\STATE $\mathbf{W}_{t+1} = \mathbf{W}_t + \dot{\mu}(\mathbf{x}_t^\top\mathbf{\theta}_{t+1})\mathbf{x}_t{\mathbf{x}_t}^\top$

 \STATE $\H_{t+1} = \H_{t}$ and $\Theta_{t+1} = \Theta_t$ 
 
 % \STATE $\mathcal{C}_{t+1}(\delta) = \{\lVert \bm\theta - \bm\theta_{t+1} \rVert^2_{\mathbf{W}_{t+1}} \leq \eta_t(\delta)\}$

\ELSE
\STATE $\H_{t+1} = \H_t \cup \{(\mathbf{x}_t, y_{t})\}$. 
\STATE Let $\mathbf{\theta}^\H_{t+1}$ be solution of \ref{equation:optimization2} up to precision $1/t$.
                
\STATE $\mathbf{V}_{t}^\H = \sum_{\mathbf{x} \in \H_t}\mathbf{x}\mathbf{x}^\top/\kappa + \gamma_t(\delta)\mathbf{I}_{Nd}$
                
\STATE $\Theta_{t+1} = \cbrak{\matnorm{\mathbf{\theta}-\mathbf{\theta}^\H_{t+1}}{\mathbf{V}^\H_{t}}^2 \leq \beta_t(\delta)} \cap \Theta_1$
\STATE $\mathbf{\theta}_{t+1} = \mathbf{\theta}_t$, $\mathbf{W}_{t+1} = \mathbf{W}_t$, $\mathbf{W}_{t+1}^i = \mathbf{W}_t^i$, $\forall i\in [N]$
\ENDIF
\STATE \textbf{return} $\bm\theta_{t+1}, \mathbf{W}_{t+1}, \{\mathbf{W}_{t+1}^i\}_{i=1}^N, \Theta_{t+1}, \mathcal{H}_{t+1}$%, \mathcal{C}_{t+1}(\delta)$
\end{algorithmic}
\end{algorithm}

In this section, we present our first algorithm \slateglincb\ (Algorithm \ref{algo:batch_OFUL}) based on the OFU (Optimization in the Face of Uncertainty) paradigm \citep{Yadkori2011} used in bandit algorithms. At a high level, \slateglincb\ (along with sub-routine Algorithm \ref{algo:adaptive-updates}) builds upon the \adaofuecolog\ algorithm (Algorithm $2$ in \cite{Faury2022}) which achieves an optimal ($\kappa$-free) $O(\sqrt{T})$ regret guarantee for logistic reward models and incurs $O(K\log T)$ per round computational cost, where $K$ is the total number of actions to choose from. In the slate bandit setting, $K$ is exponential in $N$, the number of slots in the slate, making a direct application of \adaofuecolog\ infeasible when $N$ is large. To address this, \slateglincb\ selects an item for each slot independently, reducing the per round computational cost to $N^{O(1)}$. Interestingly, despite the independent selection of items to build the slate, \slateglincb\ (via sub-routine Algorithm \ref{algo:adaptive-updates}) estimates only a single reward model using the slate level reward feedback. This is a critical difference with respect to prior works on slate bandits with bandit feedback \citep{Dimakopoulou2019} which attribute the single slate level reward feedback to individual items in the slate and estimates $N$ separate reward models. 

Input to \slateglincb\ are $T,\delta$ and $S$, where $T$ is the time horizon i.e., the total number of rounds, $\delta$ is the error probability and $S$ is a known upper bound for $\twonorm{\bm\theta^\star}$.
Similar to \adaofuecolog\ \citep{Faury2022}, \slateglincb\ maintains vectors $\bm\theta_t$, and sets $\Theta_t$ and $\mathcal{H}_t$. The vector $\bm\theta_t$ provides an estimate of $\bm\theta^\star$ during the $t^{th}$ round.
Set $\Theta_t \subseteq \Theta_1 =  \{\twonorm{\bm\theta}\leq S\}$ is an admissible set for the values of $\bm\theta_{t+1}$ 
and contains the true reward parameter $\bm\theta^\star$ with high probability (See Proposition 7 in \cite{Faury2022} for more details). In order to facilitate adaptivity, \adaofuecolog\  introduced the set $\mathcal{H}_t$ comprising pairs $(\mathbf{x}_s, y_s(\mathbf{x}_s))$ ($s\leq t$) at which an inequality criterion (described in \emph{Step 3} of Algorithm \ref{algo:adaptive-updates}) fails. 
In addition to these, \adaofuecolog\ also introduces a matrix $\mathbf{W}_t= \lambda\mathbf{I} + \sum_{s=1}^{t-1} \dot{\mu}(\mathbf{x}_{s}^\top \mathbf{\theta}_{s+1})\mathbf{x}_{s}\mathbf{x}_{s}^\top$ as on-policy proxy for the \emph{concentration matrix} $\mathbf{H}_t = \lambda\mathbf{I} + \sum_{s=1}^{t-1}\dot{\mu}(\mathbf{x}_s^\top\mathbf{\theta}^\star)\mathbf{x}_s\mathbf{x}_s^\top$, to enable efficient per round computation of parameter estimates. In \slateglincb, along with $\mathbf{W}_t$, we also maintain $N$ other such matrices (one for each slot $i\in [N]$), $\mathbf{W}_t^i = \lambda I + \sum_{s=1}^{t-1} \dot{\mu}(\mathbf{x}_{s}^\top \mathbf{\theta}_{s+1})\mathbf{x}_{s}^i{\mathbf{x}_{s}^i}^\top$. These matrices help us in the explore-exploit trade-off while selecting the item for the $i^{th}$ slot.


Next we go through the steps of \slateglincb\ (Algorithm \ref{algo:batch_OFUL}) and its sub-routine (Algorithm \ref{algo:adaptive-updates}) to provide a more detailed explanation.
\emph{Steps 3-7} (Algorithm \ref{algo:batch_OFUL}) is where \slateglincb\ differs significantly from \adaofuecolog. Instead of getting the set of arm features $\mathcal{X}_t$ (slates in our case) directly from the environment (as in \adaofuecolog), \slateglincb\ receives $N$ different sets of items $\mathcal{X}_t^i,$ for each slot $i\in [N]$. Then, it picks one item $\mathbf{x}_t^i \in \mathcal{X}_t^i$, using the optimistic rule mentioned in \emph{Step 4} (Algorithm \ref{algo:batch_OFUL}). Note that, the underlying optimization problem for slot $i$, only requires the candidate items in $\X_t^i$ and the components $\mathbf{\theta}_t^i$ of $\bm\theta_t$ that correspond to the $i^{th}$ slot, and thus, can be solved independently and in parallel for all slots. Why the selection of items independently at the slot level leads to optimal selection at the slate level is quite interesting and constitutes the core technical part of our regret guarantee (Theorem \ref{theorem: Regret OFUL}). Essentially, we can show that, under our diversity assumption (Assumption \ref{assumption: diversity}), the positive definite matrices $\mathbf{W}_t$ and $diag(\mathbf{W}_t^1,\ldots, \mathbf{W}_t^N)$ are multiplicatively equivalent, further implying that, for all slates $\mathbf{x}_t = (\mathbf{x}_t^1, \ldots , \mathbf{x}_t^N)$, the quantities $\matnorm{\mathbf{x}_t}{\mathbf{W}_t}$ and $\sum_{i\in [N]}\matnorm{\mathbf{x}_t^i}{\mathbf{W}_t^i}$ are multiplicatively equivalent. This observation is exploited in our algorithm to convert an optimistic selection rule at the slate level into an equivalent optimistic selection rule for each slot.
In \emph{Step 5} (Algorithm \ref{algo:batch_OFUL}), we select the slate $\mathbf{x}_t = (\mathbf{x}_t^1, \ldots, \mathbf{x}_t^N)$, yielding a reward $y_t$. At this point, \slateglincb\ calls a sub-routine described in Algorithm \ref{algo:adaptive-updates} which updates the parameters $\bm\theta_t$, $\mathbf{W}_t$, $(\mathbf{W}_t^1, \ldots, \mathbf{W}_{t}^N)$, $\Theta_t$, and $\mathcal{H}_t$. The update rules in Algorithm \ref{algo:adaptive-updates} largely follow the one in \adaofuecolog, which is based on the following inequality criterion.
\begin{equation}
\dot{\mu}(\mathbf{x}_t^\top \bar{\bm\theta}_t) \leq 2 \min\{\dot{\mu}(\mathbf{x}_t^\top \mathbf{\theta}^0_{t}), \dot{\mu}(\mathbf{x}_t^\top \mathbf{\theta}^1_{t})\}
\label{equation:adaptivity-criterion}
\end{equation}
Here $\bar{\bm\theta}_t, \bm\theta_t^0, \bm\theta_t^1 \in \R^{dN}$, are $\mathcal{F}_t$-adapted parameters that enable adaptivity. They are obtained as follows.
\begin{equation}
    \bar{\mathbf{\theta}}_t = \argmin\limits_{\bm\theta \in \Theta_t} \sbrak{\eta \matnorm{\mathbf{\theta} - \mathbf{\theta}_t}{\mathbf{W}_t}^2 + \sum\limits_{u\in \{0,1\}} \ell(\inner{\mathbf{x}_t}{\mathbf{\theta}} , u)}
    \label{theta_bar}
\end{equation}

\begin{equation}
    {\mathbf{\theta}}^u_t = \argmin\limits_{\bm\theta \in \Theta_t} \sbrak{\eta \matnorm{\mathbf{\theta} - \mathbf{\theta}_t}{\mathbf{W}_t}^2 + \ell(\inner{\mathbf{x}_t}{\mathbf{\theta}} , u)}
    \label{theta_u}
\end{equation}
where $\ell(\mathbf{x},y) = -y\log \mu(\mathbf{x}) - (1-y) \log (1-\mu(\mathbf{x}))$ is the cross entropy loss and $\eta = (2+diam(\Theta_t))^{-1}$. When the inequality in \ref{equation:adaptivity-criterion} holds, $\mathbf{\theta}_{t}, \mathbf{W}_t$ and $\mathbf{W}_t^i$ ($i\in [N]$) are updated as described in \emph{Steps 4-6} (Algorithm \ref{algo:adaptive-updates}). First, in \emph{Step 4}, $\bm\theta_{t+1}$ is computed by solving the following optimization problem up to a precision of $1/t$.
\begin{equation}
    \label{equation:optimization}
    \mathbf{\theta}_{t+1} = \arg \min\limits_{\Theta_t} \sbrak{\eta\matnorm{\mathbf{\theta} - \mathbf{\theta}_t}{\mathbf{W}_t}^2 + \ell(\inner{\mathbf{x}_t}{\mathbf{\theta}} , y_{t})}
\end{equation}
Following this,  $\mathbf{W}_t^i$ ($i\in [N]$) and $\mathbf{W}_t$ are updated in \emph{Step 5} and \emph{Step 6}  as per their definitions provided earlier. When the inequality in \ref{equation:adaptivity-criterion} does not hold, $\mathcal{H}_t$ and $\Theta_t$ are updated as described in \emph{Steps 9-12} (Algorithm \ref{algo:adaptive-updates}). In \emph{Step 9}, since the inequality criterion failed, $\mathcal{H}_t$ is updated to $\mathcal{H}_{t+1}$ by appending the pair $(\mathbf{x}_t, y_t)$ to it. Using $\mathcal{H}_{t+1}$, in \emph{Step 10}, another estimate $\bm\theta_{t+1}^\mathcal{H}$ of $\theta^\star$ is computed by minimizing the regularized cross-entropy loss (up to a precision $1/t$).
\begin{equation}
\label{equation:optimization2}
    \mathbf{\theta}^\H_{t+1} = \argmin\sum\limits_{(\mathbf{x},y) \in \H_{t+1}} \ell(\inner{\mathbf{x}}{\mathbf{\theta}} , r) + \gamma_t(\delta)\twonorm{\mathbf{\theta}}^2
\end{equation}
Using this estimate, and a design matrix $\mathbf{V}_t^{\mathcal{H}}$ computed in \emph{Step 11}, in \emph{Step 12} the set $\Theta_t$ is updated to $\Theta_{t+1}$ by taking an intersection between a confidence set of radius $\beta_t(\delta)= O(dN\log (t/\delta))$ around the new estimate $\bm\theta_{t+1}^{\mathcal{H}}$ (that contains $\theta^\star$ with probability $1-\delta$) and the initial set $\Theta_1 = \{\twonorm{\bm\theta}\leq S\}$. In Lemma $8$, \cite{Faury2022} show that $|\mathcal{H}_T| = \tilde O(\kappa dNS^6)$. The rounds corresponding to $\mathcal{H}_T$, therefore, incur at most $\tilde O(\kappa dNS^6)$ regret. 

In Theorem \ref{theorem: Regret OFUL}, we provide a regret guarantee for \slateglincb\ and present its proof in Appendix \ref{appendix: proof_regret_oful}.

\begin{theorem}[Regret of \slateglincb]
\label{theorem: Regret OFUL}
Let $\mathcal{T}$ denote the set of rounds until round $T$ where the inequality condition in \ref{equation:adaptivity-criterion} fails, i.e., $\mathcal{T} = \{s\in [T]: (\mathbf{x}_s, y_s)\in \mathcal{H}_T\}$. Let $\mathbf{x}_{\star, t} = \argmax_{\mathbf{x}\in \mathcal{X}_t} \mu(\mathbf{x}^\top\bm\theta^\star)$, be the optimal slate at round $t\in [T]$. Under the diversity assumption (Assumption \ref{assumption: diversity}), at the end of $T$ rounds, with probability at least $1 - 6\delta$, the regret $R(T)$ of \slateglincb\ satisfies,
\[
R(T) = \tilde O\bigg(SdN\sqrt{\sum\limits_{t\notin \mathcal{T}} \dot{\mu}(\mathbf{x}_{\star, t}^\top \bm\theta^\star)} + S^6 d^2 N^2 \kappa \bigg)
\] 
\end{theorem}


\textbf{Remark: }Let $\mathcal{T}$ be as defined in Theorem \ref{theorem: Regret OFUL}. The per-round time complexity of \slateglincb\ is $O((dN\log t)^2)$ for rounds $t\in [T]\setminus \mathcal{T}$ and it is $O(Ndt)$ for rounds $t\in \mathcal{T}$. Lemma $8$ in \cite{Faury2022} implies that $|\mathcal{T}| = O(\kappa d N S^6)$. Thus, the $O(Ndt)$ per-round complexity is incurred for only these many rounds.



\section{\slateglincbts}
\label{section:TS}
In this section, we present our second algorithm, \slateglincbts\ (Algorithm \ref{algo:TS}) based on the Thompson Sampling paradigm \citep{Thompson1933, Russo2018} used in bandit algorithms. \slateglincbts\ builds upon the \tsecolog\ algorithm (Algorithm $3$ in Appendix $D.2$, \cite{Faury2022}) while adapting to the changing action sets using the update strategy in Algorithm \ref{algo:adaptive-updates}. \tsecolog\ adapts the Linear Thompson Sampling algorithm from \cite{Abeille2017} (Figure $1$ in \cite{Abeille2017}) that perturbs the estimated parameter vector by adding an appropriately transformed noise vector sampled from a suitable multivariate distribution $\mathcal{D}^{TS}$ satisfying some nice properties (See Definition $1$ of \cite{Abeille2017}). Following this, the optimal action (slate in our case) with respect to the new perturbed parameter vector is chosen. While \tsecolog\ also achieves an optimal $O(\sqrt{T})$ regret guarantee for logistic reward models (for fixed action sets), similar to \adaofuecolog\, it also incurs per round computational cost proportional to the number of actions $K$ (recall $K = 2^{\Omega(N)}$ in our setting) due to its selection at the slate level. To circumvent this, \slateglincbts\ operates at the slot level and for each slot $i\in [N]$, it perturbs the components of the estimated parameter vector (corresponding to the $i^{th}$ slot) using a noise vector sampled independently of all other slots. This is followed by selecting the optimal items for each slot independently, thereby incurring an $N^{O(1)}$ per round time complexity in choosing the slate. While the items for each slot are independently determined, similar to \slateglincb, \slateglincbts\ also estimates a single reward model and updates the parameter vector for this model jointly using the slate level reward $y_t$, by employing the update strategy in Algorithm \ref{algo:adaptive-updates}. 

Input to \slateglincbts\ are $T,\delta, S$ and $\mathcal{D}^{TS}$, where $T$ is the time horizon i.e., the total number of rounds, $\delta$ is the error probability, $S$ is a known upper bound for $\twonorm{\bm\theta^\star}$ and $\mathcal{D}^{TS}$ is a multivariate distribution satisfying properties in Definition $1$ in \cite{Abeille2017}.
During the course of the algorithm, \slateglincbts\ maintains vectors $\bm\theta_t$, matrices $\mathbf{W}_t$, $\mathbf{W}_t^i$ ($i\in [N]$) and sets $\Theta_t, \mathcal{H}_t$ with exactly the same definition as in \slateglincb.


Next, we go through the steps of \slateglincbts\ (Algorithm \ref{algo:TS}).
\emph{Steps 3-10} is where \slateglincbts\ differs significantly from \tsecolog. Instead of getting the set of arm features $\mathcal{X}_t$ (slates in our case) directly from the environment (as in \tsecolog), \slateglincbts\ receives $N$ different sets of items $\mathcal{X}_t^i, i\in [N]$ in \emph{Step 4}.
While \tsecolog\ samples one noise vector $\eta\in \R^{dN}$ from $\mathcal{D}^{TS}$ and perturbs the estimated parameter vector $\bm\theta_t$ by adding (a scalar multiple of) $({\bm W_t})^{-1/2} \bm\eta$, \slateglincbts\ independently samples $N$ such vectors $\bm\eta_1,\ldots, \bm\eta_N$ and perturbs the components $\bm\theta_t^i$ of $\bm\theta_t = (\bm\theta_t^1, \ldots, \bm\theta_t^N)$ (corresponding to the item features in the $i^{th}$ slot) to $\tilde{\bm\theta}_t^i \in \R^d$ by adding to it (a scalar multiple of) $({\bm W_t^i})^{-1/2} \bm\eta_i$ (\emph{Step 7} and \emph{8}). The algorithm continues to sample these noise vectors until the perturbed vector $\tilde{\bm\theta}_t = (\tilde{\bm\theta}_t^1, \ldots, \tilde{\bm\theta}_t^N)$ belongs to the admissible set $\Theta_t$. Once this happens, in \emph{Step 11}, it picks the item $\mathbf{x}_t^i \in \mathcal{X}_t^i$, which is optimal with respect to the perturbed parameter vector $\tilde{\bm\theta}_t^i$. Note that, the underlying optimization problem for slot $i$, only requires the candidate items in $\X_t^i$ and the perturbed vectors $\tilde{\mathbf{\theta}}_t^i$, and thus, can be solved independently and in parallel for all slots. 

In \emph{Step 12}, we select the slate $\mathbf{x}_t = (\mathbf{x}_t^1, \ldots, \mathbf{x}_t^N)$, yielding a reward $y_t$. At this point, \slateglincb\ calls a sub-routine described in Algorithm \ref{algo:adaptive-updates} which performs updates to $\bm\theta_t$, $\mathbf{W}_t$, $(\mathbf{W}_t^1, \ldots, \mathbf{W}_{t}^N)$, $\Theta_t$, $\mathcal{H}_t$. We make a few additional remarks about \slateglincbts\ below.
\begin{algorithm}[!ht] 
\caption{\texttt{Slate-GLM-TS}}
\label{algo:TS} 
\begin{algorithmic}[1]
\STATE \textbf{Inputs:} $T, \delta, S, \mathcal{D}^{TS}$

\STATE Initialize $\mathbf{W}_1^1 = \ldots = \mathbf{W}_1^N = \mathbf{I}_d$, $\mathbf{W}_1 = I_{dN}$, $\Theta_1 = \{\twonorm{\mathbf{\theta}}\leq S\}$, $\mathbf{\theta}_1 \in \Theta_1$, $\eta_t(\delta) = O(S^2Nd\log(t/\delta))$, and $ \H_1 = \emptyset$
        
\FOR{each round $t \in [T]$}
\STATE Obtain the set of items $\mathcal{X}^i_t$, $\forall i \in [N]$
\STATE Set reject = True
\WHILE{reject}
    \STATE Sample $\mathbf{\eta}^{1}, \ldots, \mathbf{\eta}^N \overset{\mathrm{iid}}{\sim} \mathcal{D}^{TS}$ 
    \STATE Define $\tilde{\mathbf{\theta}}^{i}_t = \mathbf{\theta}^{i}_t + \eta_t(\delta)(\mathbf{W}_t^i)^{-1/2}\mathbf{\eta}^{i}$, $\forall i\in [N]$
    \STATE If $\tilde{\mathbf{\theta}}_t = (\tilde{\mathbf{\theta}}^{1}_t , \ldots , \tilde{\mathbf{\theta}}^{N}_t) \in \Theta_t$, 
reject = False
\ENDWHILE
\STATE For each $i\in [N]$, find item $\mathbf{x}^i_t = \argmax\limits_{\mathbf{x} \in \X^i_t} \inner{\mathbf{x}}{\tilde{\mathbf{\theta}}^{i}_t}$

\STATE Select slate $\mathbf{x}_t = (\mathbf{x}^1_t, \ldots, \mathbf{x}^N_t)$ and get reward $y_t$
\STATE Obtain $\bm\theta_{t+1}$, $\mathbf{W}_{t+1}$, $(\mathbf{W}_{t+1}^1, \ldots, \mathbf{W}_{t+1}^N)$, $\Theta_{t+1}$, $\mathcal{H}_{t+1}$ by calling Algorithm \ref{algo:adaptive-updates} with inputs $\mathbf{x}_t$, $y_t$, $\bm\theta_t$, $\mathbf{W}_t$, $(\mathbf{W}_t^1, \ldots, \mathbf{W}_{t}^N)$, $\Theta_t$, $\mathcal{H}_t$
\ENDFOR
\end{algorithmic}
\end{algorithm}

\textbf{Remark: }It's easy to see that the per round time complexity of \slateglincbts\ is $N(d\log T)^{O(1)}$. This is significantly lower than that of \texttt{TS-ECOLog} which runs in time exponential in $N$. The improvement comes as a result of the slot-level selection in \slateglincbts. This along with the efficient estimation of $\bm\theta_t$ in Algorithm \ref{algo:adaptive-updates}, ensures that the algorithm has low per-round time complexity making it useful in practical scenarios. This is validated by our Synthetic and Real-World experiments in Section \ref{section:experiments}. We also observe that in almost all experiments we performed, the regret of \slateglincbts\ was quite competitive and better than most baselines. 
% In fact, in many experiments, it was only second to \slateglincb\ which had the lowest regret. 
Even though we do not provide a theoretical guarantee for the regret of \slateglincbts, in Appendix \ref{appendix: regret_proof_TS}, we provide a fixed-arms version of \slateglincbts\ called \slateglincbtsfixed\ which operates in the non-contextual setting, like \tsecolog\, i.e., the action (slate) features do not change over time. It uses the short warm-up procedure from \tsecolog\ and the slot-level selection technique from \slateglincbts\ resulting in a per round time complexity linear in $N$. By utilizing the multiplicative equivalence of $\bm W_t$ and $diag(\bm W_t^1, \ldots , \bm W_t^N)$ that we showed in the proof of Theorem \ref{theorem: Regret OFUL} (using the diversity assumption (Assumption \ref{assumption: diversity})), and adapting the proof of \tsecolog\ (Theorem $5$, \cite{Faury2022}), we prove an optimal ($O(\sqrt{T})$) dependence on the number of rounds $T$. For brevity, we discuss details of \slateglincbtsfixed\ (Algorithm \ref{algo:TS-Fixed}) and its regret guarantee (Theorem \ref{theorem:TS}) in Appendix \ref{appendix: regret_proof_TS}.
