\section{RELATED WORK}
\textbf{Pure exploration }
The pure exploration problem in single-agent scenarios has been extensively explored in works like \cite{Mannor2004TheSC, EvenDar2006ActionEA, Bubeck2009PureEI, Gabillon2011MultiBanditBA, Gabillon2012BestAI, Jamieson2013lilU, Garivier2016OptimalBA, Chen2016TowardsIO}, primarily within the multi-armed bandit framework. Subsequently, \cite{Soare2014BestArmII, Xu2017AFA, Tao2018BestAI, Kazerouni2019BestAI, Fiez2019SequentialED, Degenne2020GamificationOP, Jedra2020OptimalBI} extended these investigations to linear bandits. Advancements by \cite{Scarlett2017LowerBO, Vakili2021OptimalOS, Zhu2021PureEI, Camilleri2021HighDimensionalED} further expanded the scope to kernelized bandits. However, these algorithms often suffer from prolonged learning processes and reduced efficacy in the face of limited sample budgets. Hence, our study focuses on the federated resolution of the pure exploration problem.

\textbf{Distributed/federated pure exploration }  
The exploration of pure exploration problem in distributed/federated bandits has become a focal point in recent research. Studies by \citet{Hillel2013DistributedEI, Tao2019CollaborativeLW, Karpov2020CollaborativeTD, Mitra2021ExploitingHI, Reda2022NearOptimalCL, Chen2022FederatedBA, Reddy2022AlmostCC} investigated MAB in a synchronous environment, while \cite{Du2021CollaborativePE} explored kernelized bandits synchronously. Primarily designed for synchronous settings, these studies often rely on experimental design to extract exploration sequences. However, such algorithms encounter challenges in asynchronous environments, stemming from their reliance on 1) global synchronous communication rounds, 2) advance knowledge of the active agent for each round, and 3) the server and agents possessing prior knowledge of the time index $t$. Our solution addresses these challenges, presenting the inaugural purely asynchronous algorithms for federated pure exploration with fixed confidence.

\textbf{Distributed/federated regret minimization 
}
 In tandem with pure exploration, \cite{Auer2002FinitetimeAO, AbbasiYadkori2011ImprovedAF, Filippi2010ParametricBT, Agrawal2012ThompsonSF, Chowdhury2017OnKM} pioneered the study of regret minimization in single-agent settings. This problem has recently expanded to distributed/federated bandits, with literature focusing on MAB \cite{Szrnyi2013GossipbasedDS, Korda2016DistributedCO,Wang2019DistributedBL, Mahadik2020FastDB, Shi2021FederatedMB2, Zhu2021FederatedBA, Yang2021CooperativeSB, Yang2022DistributedBW, Yang2023CooperativeMB,Patel2023FederatedOA}, linear bandits \cite{Wu2016ContextualBI, Wang2019DistributedBL, Dubey2020DifferentiallyPrivateFL, Huang2021FederatedLC, Li2022CommunicationEF, Amani2022DistributedCL, Huang2023FederatedLC, Zhou2023OnDP}, kernelized bandits \cite{Li2022CommunicationED}, and neural bandits \cite{Dai2022FederatedNB}. However, these works are confined to synchronous settings. In alignment with our approach, \cite{Chen2023OnDemandCF, Li2021AsynchronousUC, He2022ASA, li2023learning} targeted regret minimization in an asynchronous environment. Despite this alignment, the primary objectives of regret minimization differ significantly from those of pure exploration, and none of the mentioned works directly addresses our specific problem.


