\section{Conclusion and future work}\label{sec:concl}
\vskip -0.05in % only used in ICLR'22 submission.

We propose a novel convergence analysis for federated averaging Langevin dynamics (FA-LD) with distributed clients. Our results no longer require the bounded gradient assumption in $\ell_2$ norm as in the optimization-driven literature in federated learning. The theoretical guarantees yield a concrete guidance on the selection of the optimal number of local updates. In addition, the convergence highly depends on the data heterogeneity and the injected noises, where the latter also inspires us to consider correlated injected noise to balance between the efficacy of federation and accuracy.


Our work initiated the theoretical study of standard sampling algorithms in federated learning and paved the way for future works of advanced Monte Carlo methods, such as Hamiltonian Monte Caro \cite{Neal12, Mangoubi18_leapfrog, Chen_Vempala}, underdamped Langevin dynamics \cite{ccbj18}, and replica exchange Monte Carlo (also known as parallel tempering) \cite{deng2020, deng_VR} %, and dynamic importance sampling \cite{CSGLD} 
in federated learning. It is also interesting to study efficient bias-free device-sampling schemes to tackle the straggler’s effect in real world applications.  %and the optimal number of local steps under non-strongly convex \cite{dk17} or non-convex assumptions \cite{Maxim17, ma19}.