\section{Preliminaries}

In private federated data analysis, a central server calculates aggregate statistics based on sensitive inputs from $n$ clients. The statistics might be as simple as the prevalence of some event, or as complicated as a gradient to a large neural network. To preserve privacy, the clients transmit a sanitized version of their input to the server. Two popular privacy notions used for sanitization are local differential privacy~\citep{duchi2013local, kasiviswanathan2011can} and metric differential privacy~\citep{andres2013geo}.  

\subsection{Privacy Definitions}

%-- local differential privacy
%-- metric differential privacy

\begin{definition}
A randomized mechanism $\calM$ with domain $\dom(\calM)$ and range $\range(\calM)$ is said to be $\epsilon$-local differentially private (LDP) if for all pairs $x$ and $x'$ in the domain of $\calM$ and any $S \subseteq \range(\calM)$, we have that:
\[ \Pr(\calM(x) \in S) \leq e^{\epsilon} \Pr(\calM(x') \in S). \]
\end{definition}
Here $\epsilon$ is a privacy parameter where lower $\epsilon$ implies better privacy. The LDP mechanism $\calM$ is run on the client side, and the result is transmitted to the server. We assume that the clients and the server do not share any randomness. It might appear that a local DP requirement implies that a client's response contains very little useful information. While each individual response may be highly noisy, the server is still able to obtain a fairly accurate estimate of an \emph{aggregate property} if there are enough clients. Thus, the challenge in private federated data analysis is to design protocols --- privacy mechanisms for clients and aggregation algorithms for servers --- so that client privacy is preserved, and the server can obtain an accurate estimate of the desired statistic. 

A related definition is metric differential privacy (metric-DP)~\citep{chatzikokolakis2013broadening}, which is also known as geo-indistinguishability~\citep{andres2013geo} and is commonly used to quantify location privacy.

\begin{definition}\label{def:metricdp}
A randomized mechanism $\calM$ with domain $\dom(\calM)$ and range $\range(\calM)$ is said to be $\epsilon$-metric DP with respect to a metric $d$ if for all pairs $x$ and $x'$ in the domain of $\calM$ and any $S \subseteq \range(\calM)$, we have that:
\[ \Pr(\calM(x) \in S) \leq e^{\epsilon d(x, x')} \Pr(\calM(x') \in S). \]
\end{definition}

Metric DP offers granular privacy that is quantified by the metric $d$ -- inputs $x$ and $x'$ that are close in $d$ are indistinguishable, while those that are far apart in $d$ are less so. 
%For location privacy, this property allows us to obscure fine-grained location such as buildings and city-blocks, while still allowing the transmission of coarse-grained location such as zip-codes and cities. 

\subsection{Problem Statement}

In addition to balancing privacy and accuracy, a bottleneck of federated analytics is communication since client devices typically have limited network bandwidth. Thus, the goal is to achieve privacy and accuracy along with a limited amount of communication between clients and servers. We formalize this problem as follows.  

\begin{problem}\label{prob:fl}
Suppose we have $n$ clients with sensitive data $x_1, \ldots, x_n$ where each $x_i$ lies in a domain $\calX$, and a central server $S$ seeks to approximate an aggregate statistic $\calT_n$. Our goal is to design two algorithms, a client-side mechanism $\calM$ and a server-side aggregation procedure $\calA_n$, such that the following conditions hold:
\begin{enumerate}
\item $\calM$ is $\epsilon$-local DP (or $\epsilon$-metric DP). 
\item The output of $\calM$ can be encoded in $b$ bits.
\item $\calA_n(\calM(x_1), \ldots, \calM(x_n))$ is a good approximation to $\calT_n(x_1, \ldots, x_n)$. 
\end{enumerate}
\end{problem}

Prior works addressed the communication challenge by making the clients use a standard local DP mechanism followed by a standard quantization process. We develop methods where both mechanisms are designed together so as to obtain high accuracy at the server end. 


%-- federated analytics with local dp and a communication budget
%-- goal is to get high accuracy as well as low communication


\subsection{Asymptotic Consistency}

We posit that any good federated analytics solution $(\calM, \calA_n)$ where $\calM$ is a client mechanism and $\calA_n$ is the server-side aggregation procedure should have an {\em{asymptotic consistency}} property. Loosely speaking, this property ensures that the server can approximate the target statistic $\calT_n$ arbitrarily well with clients. Formally,

\begin{definition}
We say that a private federated analytics protocol is {\em{asymptotically consistent}} if the output of the server's aggregation algorithm $\calA_n( \calM(x_1), \ldots, \calM(x_n))$ approaches the target statistic $\calT_n(x_1, \ldots, x_n)$ as $n \rightarrow \infty$. In other words, for any $\alpha, \delta > 0$, there exists an $n_0$ such that for all $n \geq n_0$, we have:
\[ \Pr(| \calA_n(\calM(x_1), \ldots, \calM(x_n)) - \calT_n(x_1, \ldots, x_n)| \geq \alpha) \leq \delta \]
\end{definition}

While the server can use any aggregation protocol $\calA_n$, the most common is a simple averaging of the client responses --  $\calA_n(\calM(x_1), \ldots, \calM(x_n)) = \frac{1}{n} \sum_i \calM(x_i)$. It is easy to show the following lemma.

%that if $\calM(x)$ is unbiased for all $x$ -- that is, if $\bbE[\calM(x)] = x$ for all $x$, then the entire solution is asymptotically consistent. 

\begin{lemma} \label{lem:unbiased}
If $\calM(x)$ is unbiased for all $x$ and has bounded variance, and if $\calA_n$ computes the average of the client responses, then the federated analytics solution is asymptotically consistent.
\end{lemma}

While asymptotic consistency may seem basic, it is surprisingly not satisfied by a number of simple solutions. An example is when $\calM(x)$ is a Gaussian mechanism whose output is truncated to an interval $[a, b]$. In this case, if $x_i = a$ for all $i$, the truncated Gaussian mechanism will be biased with $\bbE[\calM(x_i)] > x_i$, and consequently the server's aggregate will not approach $a$ for any number of clients.% even with an infinite number of clients. 

Some of the recently proposed solutions for federated learning are also not guaranteed to be asymptotically consistent. Examples include the truncated Discrete Gaussian mechanism~\citep{canonne2020discrete, kairouz2021distributed} as well as the Skellam mechanism~\citep{agarwal2021skellam}. While these mechanisms are unbiased if the range is unbounded and there are no communication constraints, their results do become biased after truncation. 

%In Section~\ref{sec:scalar}, we will propose some protocols that are asymptotically consistent. 


\subsection{Compression Tool: Dithering}

A core component of our proposed mechanisms is dithering -- a popular approach to quantization with a long history of use in communications~\citep{Schuchman1964dither,Gray1993dithered}, signal processing~\citep{Lipshitz1992quantization}, and more recently for communication-efficient distributed learning~\citep{Alistarh2017qsgd,Shlezinger2020uveqfed}. Suppose our goal is to quantize a scalar value $x \in [0,1]$ with a communication budget of $b$ bits. We consider the $B=2^b$ points $G=\{0, \frac{1}{B-1}, \frac{2}{B-1}, \dots, 1\}$ as the quantization lattice; \emph{i.e.}, the $B$ points uniformly spaced by $\Delta = 1/(B-1)$. Dithering can be seen as a random quantization function $\Dither : [0,1] \rightarrow G$ that is unbiased, \emph{i.e.}, $\bbE[\Dither(x)] = x$.\footnote{When the number of grid points $B$ is clear from the context, we simply write $\Dither(x)$ to simplify notation; otherwise we write $\Dither_B(x)$ to indicate the value of $B$.}  Moreover, the distribution of the quantization errors $\Dither(x) - x$ can be made independent of the distribution of $x$.

While there are many forms of dithered quantization~\citep{Gray1993dithered}, we focus on the following. If $x \in [\frac{i}{B-1}, \frac{i+1}{B-1})$ where $0 \leq i \leq B-1$, then $\Dither(x) = \frac{i}{B-1}$ with probability $(B-1) (\frac{i+1}{B-1} - x)$, and $\Dither(x) = \frac{i+1}{B-1}$ with probability $(B-1)(x - \frac{i}{B-1})$. A simple calculation shows that $\bbE[\Dither(x)] = x$ and moreover that the variance is bounded above by $\bbE[(\Dither(x) - x)^2] \le \Delta^2 / 4$. This procedure is equivalent to the \emph{non-subtractive} dithering scheme $\Dither(x) = \min_{q \in G} |q - (x - U)|$, where $U$ is uniformly distributed over the interval $[-\Delta/2, \Delta/2]$; see, e.g.,~\cite[Lemma~2]{Aysal2008distributed}. 

%To quantize vectors, we do coordinate-wise quantization and rejection sampling if the vector norm is above a particular bound. 

%\cg{We should describe how to quantize vectors, which is implicitly used in the experiment section.}

%\mike{Added references and description here. Please take a look! If we need space I think we could drop the algorithmic/pseudo-code description and just go with what is in the text.} \kc{I removed the algorithm box. You're right we will need space for experiments.}

%\begin{algorithm}[t]
%\caption{$\Dither_B(x)$: Nonsubtractive Dithering}
%\label{alg:dither}
%\begin{algorithmic}[1]
%	\STATE $\Dither(x, G)$:
%	\STATE \textbf{Inputs}: Scalar $x \in [0, 1]$, and number of points $B$ in the grid $G = \{ 0, \frac{1}{B-1}, \frac{2}{B-1}, \ldots, 1 \}$. 
%	\IF{$x \in [\frac{i}{k}, \frac{i+1}{k})$}
%		\STATE Let: {\begin{eqnarray*}
%		z & =  \frac{i}{k}, & {\text{w.p.\;}} k\left(x - \frac{i}{k}\right) \\
%		& = \frac{i+1}{k}, & {\text{otherwise}}
%		\end{eqnarray*}}
%	\ENDIF
%	\STATE Let $i(x) = \min \{i \colon x \ge \frac{i}{B-1}\}$.
%	\STATE Sample $z$ according to \[
%		z = \begin{cases} 
%			\frac{i(x)}{B-1}, & \text{w.p.\;} (B-1) \left(\frac{i(x)+1}{B-1} - x\right) \\
%			\frac{i(x)+1}{B-1}, & \text{otherwise.}
%			\end{cases}
%		\]
%	\STATE \textbf{Return} $z$.
%\end{algorithmic}
%\end{algorithm}



