\section{EXPERIMENTS} \label{syn}


\begin{figure}[t]
\vspace{-3mm}
	\centering  %图片全局居中
\subfigbottomskip=2pt %两行子图之间的行间距
\subfigcapskip=4pt %设置子图与子标题之间的距离
\subfigure[MAB]{
\includegraphics[width=0.95\linewidth]{MAB_synthetic.png}\label{fig_synthetic_MAB}}
	\subfigure[Linear]{
\includegraphics[width=0.95\linewidth]{Linear_synthetic.png}\label{fig_synthetic_Linear}}
\caption{Synthetic data: Experimental results for federated MAB and federated linear bandits.}
\label{fig:synthetic}
\vspace{-5mm}
\end{figure}

To empirically validate the communication and sample efficiency of \texttt{FAMABPE} and \texttt{FALinPE}, we conduct experiments on both synthetic and real-world dataset. Our algorithms are compared with some baseline algorithms, in which the active agent would share its data with other agents via the server in every round. We would compare \texttt{FAMABPE} with single agent UGapEc and synchronous UGapEc \citep{Gabillon2012BestAI} in the MAB setting. Besides, we would also compare \texttt{FALinPE} with single agent LinGapE and synchronous LinGapE \citep{Xu2017AFA}. We want to clarify that the asynchronous algorithms typically incur larger communication cost than synchronous ones under the same regret/sample complexity guarantee, which is also acknowledged in prior works studying regret minimization \citep{Li2021AsynchronousUC,He2022ASA,li2023learning}. Therefore, the inclusion of synchronous algorithms' mainly serves as a reference showing the performance under the easier synchronous setting. We run the algorithms $10$ times and plot their average results. 

\subsection{Experiments on synthetic data}

In this section, we report experiments on synthetic dataset for federated MAB and linear bandits.

\subsubsection{Experiment Setup}
\paragraph{MAB }
We simulate the federated MAB in Section \ref{section2.1}, with $\sigma = 0.3$, $\delta = 0.05$, $\epsilon = 0$, $K = 5$ and $M = 10$. We sample the optimal arm from the uniform distribution and selectively sample the non-optimal arm to guarantee the reward gap. For synchronous UGapEc, we set the communication frequency as $100$ rounds.
At the end of every $100$ rounds, the agents would upload their exploration results to the server and download other agents' exploration results from the server. The communication cost of this naive synchronous algorithm is just $C(\tau) = \tau/50$, this is due to there are $\tau/(100M)$ communication episodes and in each episode agents would upload and download data for $2M$ times. The setup of \texttt{FAMABPE} follows Theorem \ref{theorem1} and in each round, the active agent $m_t$ is uniformly sampled from $\M$. 

\paragraph{Linear bandits }
Similar to the MAB case, we simulate the federated linear bandits with $d = 5$ and other parameters are the same as the MAB setting. We first sample the model parameter $\t^*$ from a uniform distribution. Then, we sample the context of the optimal arm and selectively sample non-optimal arms to guarantee the reward gap. The synchronous LinGapE is similar to synchronous UGapEc. The setup of \texttt{FALinPE} follows Theorem \ref{theorem2} and the active agent in the linear case is also uniformly sampled from $\M$.

\subsubsection{Experiment Results}
\paragraph{MAB } The results of federated MAB are shown in Figure \ref{fig_synthetic_MAB}. All algorithms output their estimated best arms $\hat{k}^* =  k^*$. We report the sample complexity and communication cost for the reward gap from $0.1$ to $0.5$. We can observe that the single agent which runs UGapEc achieved the smallest sample complexity. In comparison, the synchronous UGapEc would spend a slightly larger sample complexity when the gap equals $0.1$ and spend an almost identical cost when the gap equals $0.2$ to $0.5$. Compared with these baseline algorithms, our \texttt{FAMABPE} had a slightly larger sample complexity and can achieve the lowest communication cost (only took a communication cost of $100$ to $120$ to go from a gap of $0.5$ to a gap of $0.1$). \texttt{FAMABPE} is the only algorithm that can achieve near-optimal sample complexity and efficient communication cost in a fully asynchronous environment.

\paragraph{Linear bandits } The results of federated linear bandits are provided in Figures \ref{fig_synthetic_Linear}. All algorithms output their estimated best arm $\hat{k}^* = k^*$. Similar to the MAB, we can observe that a single agent which runs LinGapE and synchronous LinGapE achieved the lowest sample complexity. In comparison, \texttt{FALinPE} required a relatively large sample complexity, especially when the gap equals $0.1$. Furthermore, the communication cost of synchronous LinGapE is larger than \texttt{FALinPE} when the gap equals $0.1$.  Otherwise, smaller than \texttt{FALinPE}.

\subsection{Experiments on real-world data}

\begin{figure}
	\centering  
\includegraphics[width=0.95\linewidth]{real_data.png}
\caption{Experimental results on MovieLens for federated linear bandits.}
\label{fig:real}
\end{figure}

In this section, we report an additional experiment on real-world dataset for federated linear bandits setting.

\subsubsection{Experiment Setup}
%Movielens data set.........
We use the MovieLens 20M
dataset \citep{Harper2016TheMD} for the experiment. We follow \citep{Li2021AsynchronousUC} to preprocess the data and extract item features. Specifically, we  keep users with over $3,000$ observations, which results in a dataset with $54$ users,
$26567$ items (movies), and $214729$ interactions. For each item, we extract TF-IDF features from its associated tags and apply PCA to obtain item features with dimension $d=25$. We consider all items with non-zero ratings as
positive feedback (reward $r=1$), and use ridge regression to learn $\t^*$ from extracted item features and their 0/1 rewards. To construct an arm set, we follow the same procedure as the simulation in Section \ref{syn} by first sampling an optimal arm and then selectively sampling
non-optimal arms to guarantee the reward gap.


The baseline algorithms considered in this section are the same as those in the synthetic data case (Section \ref{syn}). Besides, we set $d = 25$, $K=10$, $\epsilon = 0.05$, and other parameters of the federated linear bandits are also identical to Section \ref{syn}. We report the average results of $10$ runs.

\subsubsection{Experiment Results}
The results of the federated linear bandits are shown in Figures \ref{fig:real}. In each run, every algorithm could derive the best arm $k^*$. Similar to the results based on synthetic data, single agent LinGapE and synchronous LinGapE enjoyed the lowest sample complexity, and \texttt{FALinPE} spent a relatively large sample complexity. Besides, according to the tendency, \texttt{FALinPE}'s communication cost would be smaller than the synchronous LinGapE's communication cost when the expected reward gap is smaller equals $0.15$. Note that synchronous LinGapE can only work in a synchronous environment, hence, \texttt{FALinPE} is the \textit{only} known federated linear bandit algorithm that can simultaneously achieve near-optimal sample complexity and efficient communication cost in the fully asynchronous environment.
%Due to the page limitation, 

