% \Wei{Eric xing: non-convex uncertainty}

% Federated learning allows multi-party to train a model simultaneously by sharing the model parameters and avoiding sending the data to each other. Federated learning has become particularly important in areas of artificial intelligence where users care data privacy, security, and access rights, including internet of things~\cite{cys+20}, healthcare~\cite{lgd+20,lmx+19}, text data~\cite{hsc+20}, and fraud detection~\cite{zygw20}. Today, federated learning has already been deployed in industry. 


% Compared to the classical centralized learning regime, federated learning has several unique challenge~\cite{lsts20}, including convergence, communication costs, client robustness and data heterogeneity. The training data are massively distributed over an incredibly large number of devices, and
% the communication between the central server and a device is infrequent. A direct consequence is the slow communication, which motivated communication-efficient FL algorithm. Federated average (FedAvg)~\cite{mmr+17} firstly addressed the communication efficiency problem by introducing a global model to aggregate multi-step local stochastic gradient descent updates. However, the challenge here is, if each client allows to do multiple local steps update and then communicate, looking at the server side, it is not each update in gradient direction. This is not only a challenge in the optimization, but also challenge in the sampling task.\Wei{expressing the significance of uncertainty is kind of important.}


% Sampling is a fundamental field in machine learning, and it has extensive applications in many applied areas, e.g. \Zhao{we need to cite some famous applied papers using sampling}. Given the trending of data is more distributed in the future, it is natural to consider a regime where multi-party want to do sampling in the federated setting. In this work, we propose a new concept called federated sampling and  to land the theoretical foundation of federated sampling.

% Recently, there are experimental results \cite{agxr21} showing that applying federated averaging in the sample setting is converging in practical data. But after that work, it is remaining open whether it is possible to show convergence of sampling algorithm in theory.

% \begin{center}
%     {\it Can we build a unified and generalized convergence analysis framework for sampling in FL?}
% \end{center}
% In this paper, we provide a positive answer to this question. Other than this, as we know that in the classical setting, optimization and sampling has many fundemantal diferences, e.g. \Zhao{Yian to fill}... In this work, we also want to study another question which is
% \begin{center}
%     {\it What advantage do we have in the sampling framework compared to optimization framework?, e.g. can we remove some assumption which has been widely in FL optimization}
% \end{center}



% We list our contributions as follows:
% \begin{itemize}
%     \item To the best of our knowledge, we propose the first sampling algorithm for federated learning. Compared to the classical FL work optimization, sampling has particular advantage for   uncertainty estimation (e.g. for healthcare applications).
%   % \item inclusion of noise protects the privacy.
%     \item % analysis is presented;
%     We prove a convergence result on non-i.i.d data under mild assumptions , which indicates that the injected noise, the data heterogeneity, and the stochastic noise are all driving factors that affect the convergence. Such an analysis also sheds light on the optimal choice of local updates. Further, compared to classical FL on optimization, we do not require the assumption that the $\ell_2$ norm of the gradient is bounded.
%     \item Finally, in our proposed theoretical framework, we are able to show the trade-off between accuracy and efficacy of federation.
%     %no bounded gradient assumption in $\ell_2$.
% \end{itemize}