\section{Additional Experiments and Experimental Details}
\label{appendix:experiments}
In this section, we provide additional plots to back the experiments shown in Section \ref{section:experiments}. Also, we provide additional details about our experimental setup. 

\begin{figure*}
	\centering
	\begin{subfigure}[b]{0.33\columnwidth}  
		\centering 
		\includegraphics[width=56mm]{Plots/Finite_Contexts_Error.pdf}
		\caption{{\small Regret vs.\ $T$: Finite Context Setting}}   
		\label{fig:finite_error}
	\end{subfigure}
	%\vskip\baselineskip
	\hfill
	\begin{subfigure}[b]{0.33\columnwidth}   
		\centering 
	\includegraphics[width=56mm]{Plots/Infinite_Contexts_Error.pdf}
		\caption{{\small Regret vs.\ $T$: Infinite Context Setting}}   
		\label{fig:infinite_error}
	\end{subfigure}
	\hfill
	\begin{subfigure}[b]{0.33\columnwidth}
		\centering
		\includegraphics[width=56mm]{Plots/non_contextual_our_error.pdf}
		\caption{{\small Regret vs.\ $T$: Fixed-Arm Setting}}     
		\label{fig:non_contextual_our_error}
	\end{subfigure}
	\vskip\baselineskip
	\begin{subfigure}[b]{0.33\columnwidth}  
		\centering 
		\includegraphics[width=56mm]{Plots/non_contextual_mps_error.pdf}
		\caption{{\small Regret vs.\ $T$: Fixed-Arm Setting}} 
		\label{fig:non_contextual_mps_error}
	\end{subfigure}
	%\vskip\baselineskip
	\hfill
	\begin{subfigure}[b]{0.33\columnwidth}   
		\centering 
		\includegraphics[width=56mm]{Plots/per_round_time_average_error.pdf}
				\caption{{\small Average running time (per-round)}}   
		\label{fig:average_time_error}
	\end{subfigure}
	\hfill
	\begin{subfigure}[b]{0.33\columnwidth}
		\centering
		\includegraphics[width=56mm]{Plots/per_round_time_max_error.pdf}
		\caption[observationalAlgo]%
        {{\small Maximum running time (per-round)}}   
		\label{fig:max_time_error}
	\end{subfigure}
    \vskip\baselineskip
	\begin{subfigure}[b]{0.33\columnwidth}  
		\centering 
		\includegraphics[width=56mm]{Plots/pull_time_average_error.pdf}
		\caption{{\small Average time taken to pull an arm}} 
		\label{fig:pull_time_avg_error}
	\end{subfigure}
	%\vskip\baselineskip
	\hfill
	\begin{subfigure}[b]{0.33\columnwidth}   
		\centering 
		\includegraphics[width=56mm]{Plots/pull_time_max_error.pdf}
				\caption{{\small Maximum time taken to pull an arm}}   
		\label{fig:pull_time_max_error}
	\end{subfigure}
	\hfill
	\begin{subfigure}[b]{0.33\columnwidth}
		\centering
		\includegraphics[width=56mm]{Plots/update_time_average_error.pdf}
		\caption[observationalAlgo]%
        {{\small Average time taken to update parameters}}   
		\label{fig:update_time_average_error}
	\end{subfigure}
	\caption{}
    \label{fig:Plots_appendix}
    %{\small Simple Regret} 
\end{figure*}

In all of the figures, the shaded regions represent two standard deviations. Figures \ref{fig:finite_error} and \ref{fig:infinite_error} depict the graphs from \textbf{Experiment 1} (Section \ref{section:experiments}) wherein we compare our algorithms \texttt{Slate-GLM-OFU} and \texttt{Slate-GLM-TS} to their counterparts \texttt{ada-OFU-ECOLog} and \texttt{TS-ECOLog} in the finite and infinite context settings.

Figures \ref{fig:non_contextual_our_error} and \ref{fig:non_contextual_mps_error} depict the graphs from \textbf{Experiment 3}(Section \ref{section:experiments}), wherein we compare our algorithms \texttt{Slate-GLM-OFU}, \texttt{Slate-GLM-TS}, and \texttt{Slate-GLM-TS-Fixed} to several state-of-the-art non-contextual logistic bandit algorithms. In Figure \ref{fig:non_contextual_our_error}, we only show the uncertainity involved in \texttt{Slate-GLM-OFU} and \texttt{Slate-GLM-TS}. We see that \texttt{Slate-GLM-OFU} has the best performance, with the only algorithm having comparable performance being \texttt{MPS}. On the other hand, \texttt{Slate-GLM-TS} performs worse than \texttt{ada-OFU-ECOLog} and \texttt{MPS}, while being on par with \texttt{TS-ECOLog}. However, in Figure \ref{fig:non_contextual_mps_error}, we showcase that the variance of \texttt{MPS} is very high, hence, making the algorithm less reliable in practice.

Figures \ref{fig:average_time_error} and \ref{fig:max_time_error} showcase two standard deviations in the average and maximum (per-round) running time of the algorithms. We see that both \texttt{ada-OFU-ECOLog} and \texttt{TS-ECOLog} show an exponential increase in their running times. Further, the significant gap between the average and maximum (per-round) running times of \texttt{Slate-GLM-OFU} and \texttt{Slate-GLM-TS} (as highlighted in the table below) indicates that the true per-round time is much lower than the maximum. As we have mentioned in the main paper, we calculate the per-round running time for an algorithm as the sum of the per-round pull and update times. Figures \ref{fig:pull_time_avg_error} and \ref{fig:pull_time_max_error} show the average and maximum pull times (per round), while Figure \ref{fig:update_time_average_error} display the average per-round update times. We see that the pull time for \texttt{ada-OFU-ECOLog} and \texttt{TS-ECOLog} increases exponentially with the number of slots, whereas the update times remain similar for all algorithms. Hence, the differences in per-round running times can be majorly attributed to the pulling times for each algorithm, which is in line with our theoretical claims. We also tabulate the average and maximum per-round pulling times for each algorithm in Table \ref{tab:running_times} for more clarity.

\begin{table}[H]
    \centering
\resizebox{\columnwidth}{!}{
    \begin{tabular}{ccccccccc}
    \hline
      \multirow{2}{*}{\textbf{Slots}}
      &
      \multicolumn{2}{c}
      {\texttt{ada-OFU-ECOLog}}
      &
      \multicolumn{2}{c}
      {\texttt{Slate-GLM-OFU}}
      &
      \multicolumn{2}{c}
      {\texttt{TS-ECOLog}}
      &
      \multicolumn{2}{c}
      {\texttt{Slate-GLM-TS}}
      \\
      \cline{2-9}
      &
      \multicolumn{1}{c}{\textbf{Average (ms)}}
      &
      \multicolumn{1}{c}{\textbf{Maximum (ms)}}
      &
      \multicolumn{1}{c}{\textbf{Average (ms)}}
      &
      \multicolumn{1}{c}{\textbf{Maximum (ms)}}&
      \multicolumn{1}{c}{\textbf{Average (ms)}}
      &
      \multicolumn{1}{c}{\textbf{Maximum (ms)}}
      &
      \multicolumn{1}{c}{\textbf{Average (ms)}}
      &
      \multicolumn{1}{c}{\textbf{Maximum (ms)}}
      \\
      \hline
      3 
      & 
      \multicolumn{1}{c}{$4.3 \pm 0.2$}
      & 
      \multicolumn{1}{c}{$23.0 \pm 24.5$}
      & 
      \multicolumn{1}{c}{$0.3 \pm 0.0$}
      & 
      \multicolumn{1}{c}{$9.5 \pm 12.5$}
      & 
      \multicolumn{1}{c}{$3.1 \pm 0.1$}
      & 
      \multicolumn{1}{c}{$36.6 \pm 47.7$}
      & 
      \multicolumn{1}{c}{$0.6 \pm 0.1$}
      & 
      \multicolumn{1}{c}{$19.2 \pm 33.4$}
      \\
      \hline
      4 
      & 
      \multicolumn{1}{c}{$47.5 \pm 36.4$}
      & 
      \multicolumn{1}{c}{$341.8 \pm 154.8$}
      & 
      \multicolumn{1}{c}{$0.8 \pm 0.9$}
      & 
      \multicolumn{1}{c}{$10.5 \pm 7.3$}
      & 
      \multicolumn{1}{c}{$30.3 \pm 15.6$}
      & 
      \multicolumn{1}{c}{$316.7 \pm 57.1$}
      & 
      \multicolumn{1}{c}{$2.2 \pm 1.1$}
      & 
      \multicolumn{1}{c}{$22.8 \pm 5.4$}
      \\
      \hline
      5 
      & 
      \multicolumn{1}{c}{$221.4 \pm 30.1$}
      & 
      \multicolumn{1}{c}{$1075.7 \pm 57.8$}
      & 
      \multicolumn{1}{c}{$0.6 \pm 0.2$}
      & 
      \multicolumn{1}{c}{$12.0 \pm 9.7$}
      & 
      \multicolumn{1}{c}{$184.1 \pm 121.8$}
      & 
      \multicolumn{1}{c}{$905.3 \pm 126.5$}
      & 
      \multicolumn{1}{c}{$1.2 \pm 0.5$}
      & 
      \multicolumn{1}{c}{$13.8 \pm 11.5$}
      \\
      \hline
      6 
      & 
      \multicolumn{1}{c}{$1655.6 \pm 36.3$}
      & 
      \multicolumn{1}{c}{$3335.5 \pm 494.6$}
      & 
      \multicolumn{1}{c}{$0.9 \pm 0.2$}
      & 
      \multicolumn{1}{c}{$35.8 \pm 28.9$}
      & 
      \multicolumn{1}{c}{$1309.4 \pm 55.3$}
      & 
      \multicolumn{1}{c}{$2528.3 \pm 278.2$}
      & 
      \multicolumn{1}{c}{$1.9 \pm 0.2$}
      & 
      \multicolumn{1}{c}{$68.3 \pm 71.1$}
      \\
      \hline
    \end{tabular}
    }
    \caption{Average and Maximum per-round running times (in milliseconds), averaged over 10 different seeds for sampling rewards, with 2 standard deviations}
    \label{tab:running_times}
\end{table}


Now, we provide additional details about our experimental setup. In \textbf{Experiment 3}, we implement \texttt{Ordered Slate Bandit} and \texttt{ETC-Slate} from \cite{Kale2010} and \cite{Rhuggenaath2020} respectively. Since these algorithms are designed for semi-bandit feedback, we make modifications to implement these algorithms in our setting. These modifications are detailed below:

\textbf{\texttt{Ordered Slate Bandit}}: The original algorithm in \cite{Kale2010} assumes that there exists a base set $\mathcal{X}$ such that $\modulus{\mathcal{X}} = K$ and the learner picks a slate of $N$ items from $\mathcal{X}$. Hence, their algorithm assumes that each base item is equally likely to be placed in any slot. Thus, they start with the initial distribution $P$ such that $P_{i,j} = 1 \;\forall i\in[N] \;\forall j\in[K]$. On the other hand, we cannot make the same assumption since we get a different set of items $\mathcal{X}^i_t$ for each slot $i\in[N]$. Thus, we change the initial distribution to $P$ such that $P_{i,j} = 1$ if and only if $j \in [K(i-1)+1 , K(i)]$. This modification restricts the items that can be selected for a particular slot. A similar modification is made for the exploratory distribution in each round. There is a significant difference in the manner in which the loss matrix is constructed. Since the algorithm is designed for semi-bandit feedback, the algorithm propagates the loss for the item chosen in each slot at each round. We make use of the fact that the loss is the additive inverse of the reward, and hence, we have two choices for the loss we wish to propagate. Since we operate in the logistic setting, the obvious choice is to propagate the non-linear losses to the algorithm. However, since the total loss for a slate is assumed to be the sum of the loss obtained for each slot, the linear loss seems more suitable. We experiment with both these choices, and find that the algorithm with non-linear losses incurs very high regret. Hence, we only compare our algorithms to the Ordered Slate Bandit algorithm with linear losses, referred to as \texttt{Ordered Slate Bandit}.

\textbf{\texttt{ETC-Slate}: } The original algorithm in \cite{Rhuggenaath2020} is also designed for semi-bandit feedback, wherein, it is assumed that the reward for each slot is sampled from a distribution such as the uniform distribution (see Example 1 in \cite{Rhuggenaath2020}). However, in our case, we do not have a notion of a reward distribution at the slot level. Hence, to create a reward distribution at the slot level, we assume that the reward for slot $i$ is sampled from $\mathcal{N}({\bm{x}_s^i}^\top\thetastar^i , 0.0001)$. This ensures that, in expectation, the reward attributed to a particular slot is the linear reward for the item played. We set the slate-level reward function $f$ to simply be the sigmoid function applied to the sum of the rewards obtained at the slot levels and then proceed with the algorithm. We find that \texttt{ETC-Slate} incurs very high regret, and hence, do not include the algorithm in our comparisions.

\section{Emperical Validation of the Diversity Assumption (Assumption \ref{assumption: diversity})}
\label{appendix:empirical-validation}

In this section, we show that our (instance and algorithm dependent) diversity assumption we make indeed holds for a lot of instances. We choose the number of slots $N$ to be $3$ and the number of items in each slot $\modulus{\mathcal{X}^i_t}$ is fixed to $5$. The dimension of items for each slot is fixed to $5$, resulting in the slate having a dimension $d = 15$. The items for each slot are randomly sampled from $[-1,1]^5$ and normalized to have norm $1/\sqrt{3}$, while $\thetastar$ is randomly sampled from $[-1,1]^{15}$. We operate in the Infinite context setting, wherein the items in each slot change every time round (check \textbf{Experiment 1} in Section \ref{section:experiments} for more details). We run both \texttt{Slate-GLM-OFU} and \texttt{Slate-GLM-TS} 100 times with different seeds for a horizon of $T = 10000$ rounds. For each run of the algorithm, we plot the minimum eigenvalue of $\bm{W}^i_t$ for $i \in [3]$ as a function of the time round $t$ and show our results in Figure \ref{appendix_fig: eigenvalues}. The figures clearly depict a (near) linear growth in the eigenvalues of the matrices $\bm{W}^i_t$ for all the slots $i \in [3]$ and all rounds $t \in [T]$.

\begin{figure}
    \centering
    \includegraphics[width = \columnwidth]{Plots/eigenvalues.pdf}
    \caption{Demonstration of the algorithm-dependent assumption for \texttt{Slate-GLM-OFU} and \texttt{Slate-GLM-TS} wherein we plot the minimum eigenvalues of $\bm{W}^i_t$ as a function of the time round for 100 independent runs}
    \label{appendix_fig: eigenvalues}
\end{figure}

