\section{INTRODUCTION} \label{sec:intro}
Multi-Armed Bandits (MAB) \citep{Auer2002FinitetimeAO,Lattimore2020BanditA} is a classic sequential decision-making model that is characterized by the exploration-exploitation tradeoff. Pure exploration \citep{EvenDar2006ActionEA,Soare2014BestArmII,Bubeck2009PureEI}, also known as best arm identification, is an important variant of the MAB problems where the objective is to identify the arm with the maximum expected reward. While most existing bandit
solutions are designed under a centralized setting (i.e., data is readily available at a central server),
 there is increasing
interest in federated bandits in terms of regret minimization \citep{Wang2019DistributedBL,Li2022CommunicationED,He2022ASA} and pure exploration \citep{Hillel2013DistributedEI,Tao2019CollaborativeLW,Du2021CollaborativePE} due to the increasing application scale and public concerns about privacy. Specifically, pure exploration for federated bandits considers $M$ agents identifying the best arm collaboratively with limited communication bandwidth, while keeping each agent’s raw data local. In federated bandits, the major challenge is the conflict between the need for timely data/model aggregation for low sample complexity and the need for communication efficiency with decentralized agents. Balancing model updates and communication is vital to efficiently solve the problem.

Prior works on distributed/federated pure exploration \citep{Hillel2013DistributedEI,Tao2019CollaborativeLW,Reda2022NearOptimalCL,Du2021CollaborativePE} all focused on synchronous communication protocols, where all agents simultaneously participate in each communication round to exchange their latest observations with a central server (federated setting) or other agents (distributed setting). However, the synchronous setting cannot enjoy efficient communication in real-world applications due to 1) some agents may not interact with the environment in certain rounds and 2) the communication in a global synchronous setting needs to wait until the slowest agent responds to the server, which incurs a significant latency especially when the number of the agents is large and the communication is unstable. 

To address the aforementioned challenges of model updates and communication, we study the \emph{asynchronous} communication for federated pure exploration problem in this paper. We consider both stochastic multi-armed bandit and linear bandit settings. To reduce communication costs, we propose novel asynchronous event-triggered communication protocols where each agent sends local updates to and receives aggregated updates from the server independently from other agents, i.e., global synchronization is no longer needed. This improves the robustness against possible delays and unavailability of agents. Event-triggered communication only happens when the agent has a significant amount of new observations, which reduces communication costs while maintaining low sample complexity. 

With the new communication protocols, we proposed two asynchronous federated pure exploration algorithms,  Federated Asynchronous MAB Pure Exploration (\texttt{FAMABPE}) and Federated Asynchronous Linear Pure Exploration (\texttt{FALinPE}) for MAB and linear bandits, respectively. We theoretically analyzed that these algorithms can return $(\epsilon,\delta)$-best arm with an \emph{efficient communication cost}, \emph{efficient switching cost} and \emph{near-optimal sample complexity}, where the returned arm is $\epsilon$ close to the best arm with probability at least $1-\delta$, known as fixed confidence setting \citep{Gabillon2012BestAI,Soare2014BestArmII,Xu2017AFA}. Moreover, we empirically validated the theoretical results based on synthetic data and real-world data. Experimental results showed that our event-triggered communication strategy can achieve efficient communication cost, and would only moderately affect the sample complexity compared with the synchronous baselines.


% The contributions of this paper are summarized as follows:





% \paragraph{Main motivations}



% \begin{enumerate}
   % \item In real-world practice, each agent may have a limited sample number, and can not finish the $(\epsilon,\delta)$-pure exploration problem individually. Hence, it requires involving multi-agents to solve this problem collaboratively. 
%    \item All of the previous distributed/federated bandit algorithms for pure exploration followed the synchronous setting, which requires the full participation of the agents and global synchronization communication. However, this setting is unpractical in the real world due to 1. in some rounds, some agents may not interact with the environment, 2. the communication in a synchronous environment needs to wait until the slowest agent responds to the server, which incurs a significant latency especially when the number of the agents is large and the communication is unstable. Therefore, establishing pure exploration algorithms for distributed bandits in an asynchronous environment is an emerging topic in the current distributed bandit society.
% \end{enumerate}

% want to make async, what are the challenges:
% 1. due to async, decision to communicate is pushed to agent. agent need to decide when to upload, in order to avoid waste of observations

% 2. confidence set construction challenge: not know t

% \paragraph{Technical novelty and contributions}



% \begin{enumerate}
   % \item  We first demonstrate that the pure exploration problem can be solved in an asynchronous environment by proposing two algorithms named DisAMABPE and DisALinPE. We theoretically validate that these algorithms can return the $(\epsilon,\delta)$-estimated best arm with an efficient communication cost and near-optimal sample complexity for asynchronous distributed MAB and asynchronous distributed linear bandits, respectively. 
   % \item  Previous communication strategies for the synchronous pure exploration rely on the synchronous communication round, which can not satisfy our setting. Besides, the pure exploration problem focuses on sample complexity instead of regret, so the event-triggered strategy designed for the asynchronous regret minimization problem can not be directly utilized in this problem. In this paper, we design a  new category of event-triggered-based communication strategy to ensure the sample complexity of our algorithms would not be significantly larger than the algorithms with full communication.
   % \item In the experiment section, we empirically validate the theoretical results based on synthetic data and real-world data. It has been shown that our event-triggered communication strategy can achieve efficient communication cost, and would only moderately affect the sample complexity compared with the full communication algorithms.
% \end{enumerate}
