
\vspace{-2mm}
\subsection{Differential Privacy Guarantees}
\vspace{-2mm}
We consider the $(\epsilon,\delta)$-differential privacy with respect to the substitute-one relation $\simeq_{s}$ \cite{NEURIPS2018_3b5020bb}. We say two datasets $\cS\simeq_{s}\cS'$ if they have the same size and differ by exactly one data point. For $\epsilon\ge 0$ and $\delta\in[0,1]$, a mechanism $\M$ is $(\epsilon,\delta)$-differentially private w.r.t. $\simeq_{s}$ if for any pair of input datasets $\cS\simeq_{s}\cS'$, 
and every measurable subset $E\subset \textup{Range}(\M)$, we have
\begin{equation} \label{eq:DP-def}
\prob[\M(\cS)\in E]\le e^{\epsilon}\prob[\M(\cS')\in E]+\delta.
\end{equation}

% Since Algorithm \ref{alg:alg_main_paper_text_different_seeds} is a special case of Algorithm \ref{alg:alg_main_text_partial_main} under device-sampling scheme II with $S=N$ and Algorithm \ref{alg:alg_main_paper_text_independent_noise} is a special case of Algorithm \ref{alg:alg_main_paper_text_different_seeds} when $\rho=0$, it suffices to analyze the differential privacy guarantee of Algorithm \ref{alg:alg_main_text_partial_main}. 

%As FedAvg algorithms can be divided into the processes of local updates, synchronization, and broadcasting with risks of information leakage in synchronization (local model uploading and aggregation) and broadcasting, we consider the differential privacy guarantees in synchronization and broadcasting similar to \cite{wei2020federated}. Since there is no involvement of data in model aggregation and broadcasting, they are post-processing processes. Thus, it suffices to analyze the differential privacy guarantees in local model uploading.

\vspace{-2mm}
\vspace{-2mm}
Since partial device participation is more general, we focus on analyzing the differential privacy guarantee based on updates with partial devices. Here, we present the result under scheme II. For the result under scheme I, please refer to Theorem \ref{thm:privacy_alg3_full} in the appendix.
\begin{theorem}[Partial version of Theorem \ref{thm:privacy_alg3_full}] \label{thm:privacy_alg3}
Assume assumptions \ref{assump:bdd_sens} and \ref{assump:grad_est} hold. For any $\delta_0\in(0,1)$, if $\eta\in\left(0,\frac{\tau(1-\rho^2)\gamma^2\min_{c\in[N]}p_{c}}{\Delta_l^2\log(1.25/\delta_0)}\right]$, then Algorithm $\ref{alg:alg_main_text_partial_main}$ under scheme II is $(\epsilon^{(3)}_{K,T},\delta^{(3)}_{K,T})$-differentially private w.r.t. $\simeq_{s}$ after $T$ ($T=EK$ with $E\in \mathbb{N}, E\ge 1$) iterations where
\begin{align*}
\small
%\label{eq:epsilon_alg3_main}
&\epsilon^{(3)}_{K,T}=\tilde\epsilon_K\min\left\{\sqrt{\frac{2T}{K}\log\left(\frac{1}{\delta_2}\right)} + \frac{TS (e^{\epsilon_{K}}-1)}{KN},\ \frac{T}{K}\right\},\\
&\delta^{(3)}_{K,T}=\frac{S}{N}\gamma T\delta_0+ \frac{TS}{KN}\delta_1+\delta_2,
\end{align*}
with $\tilde\epsilon_K= \log\left(1+\frac{S}{N}\left(e^{\epsilon_K}-1\right)\right)$, 
$\epsilon_K=\epsilon_1\min\left\{\sqrt{2K\log(1/\delta_1)} + K(e^{\epsilon_1}-1),\ 
K\right\}$, \\
$\epsilon_1=2\Delta_l \sqrt{\frac{\eta\log(1.25/\delta_0)}{\tau(1-\rho^2)\min_{c\in [N]}p_{c}}}$,
%\begin{align*}
%\label{eq:DP_K_partial_main}
%&\tilde\epsilon_K= 
%\log\left(1+\frac{S}{N}\left(e^{\epsilon_K}-1\right)\right),\\
%\label{eq:epsilon_K_main}
%&\epsilon_K=\min\left\{\sqrt{2K\log(1/\delta_1)}\epsilon_1 + K\epsilon_1(e^{\epsilon_1}-1),\ 
%K\epsilon_1\right\},\\
%\label{eq:epsilon1_def_main}
%&\epsilon_1=c(\delta_0)\Delta_l \sqrt{\frac{2\eta}{\tau(1-\rho^2)\min_{c\in [N]}p_{c}}},
%\end{align*} 
and $\delta_1,\delta_2\in[0,1)$.
\end{theorem}
According to Theorem \ref{thm:privacy_alg3} and section \ref{sec:DP_discussion}, Algorithm \ref{alg:alg_main_text_partial_main} is at least $(\frac{T}{K}\log\left(1+\frac{S}{N}(e^{K\epsilon_1}-1)\right),\frac{S}{N}\gamma T\delta_0)$-differentially private. 
Moreover, if 
$\eta=O\left(\frac{\tau(1-\rho^2)N^2\min_{c\in[N]}p_c\log(1/\delta_2)}{\Delta_l^2 S^2 T\log(1/\delta_0) \log(1/\delta_1)}\right)$, then we have that $\epsilon_{K,T}^{(3)}=O\left(\frac{S\Delta_l}{N}\sqrt{\frac{\eta T \log(1/\delta_0)\log(1/\delta_1)\log(1/\delta_2)}{\tau(1-\rho^2)\min_{c\in[N]}p_c}}\right)$.

There is a trade-off between privacy and utility. By Theorem \ref{thm:privacy_alg3}, $\epsilon^{(3)}_{K,T}$ is an increasing function of $\frac{\eta}{\tau(1-\rho^2)}$, $\frac{S}{N}$, and $T$. $\delta^{(3)}_{K,T}$ is an increasing function of $\frac{S}{N}$, $\gamma$, and $T$. However, by Theorem \ref{thm:partial_II}, the upper-bound of $W_{2}(\mu_T,\mu)$ is a decreasing function of $\rho$, $T$, $S$ and is an increasing function of $\tau$ and $N$. There is an optimal $\eta$ to minimize %the upper-bound of
$W_{2}(\mu_T,\mu)$ for fixed $T$ while we can make $\epsilon^{(3)}_{K,T}$ arbitrarily small by decreasing $\eta$ for any fixed $T$. In practice, users can tune hyper-parameters based on DP and accuracy budget. For example, under some DP budget $(\epsilon_*,\delta_*)$, we can select the largest $\rho\in[0,1]$ and $S\in[N]$ such that $\epsilon^{(3)}_{K,T}\le \epsilon_*$ and $\delta^{(3)}_{K,T}\le \delta_*$ to achieve the target error $W_{2}(\mu_{T},\mu)$.



