\section{Introduction}

%Notes:
%* We observe that DP naturally leads to compression. For small $\epsilon$, there's no use sending more than a few bits. Observed experimentally.
%* We implicitly assume the distribution of inputs is uniform. If we knew it was something other than uniform, we could incorporate it in the formulation.
%* Be precise about "minimum variance" in what sense: already discretized to grid (not necessarily from $[0,1]$)

%* No reason why the same number of input and output discrete points.

%Extension/future work:
%* If we knew the distribution of $p(x)$ we could do vector quantization with differential privacy constraints

%\mike{analysis, or analytics? Or do we want to start talking more broadly about federated computations, with analytics and learning being two examples?}

Federated data analytics is a framework for distributed data analysis and machine learning that is widely applicable to use-cases involving continuous data collection from a large number of devices. Here, a central server receives responses from a large number of distributed clients, and aggregates them to compute a global statistic or a machine learning model. An example is training and fine-tuning a speech-to-text model for a digital assistant; here a central server has a speech-to-text model, which is continuously updated based on feedback from client devices about the quality of predictions on their local data. Another example is maintaining real-time traffic statistics in a city for ride-share demand prediction; here, a central server located at the ride-share company collects and aggregates location data from a large number of user devices. 

%\kc{Is this example appropriate or too sensitive?} \mike{Should be fine. There are other papers discussing this sort of application that we could always cite if asked.}

% \mike{Rather than emphasizing low-power devices here (also in the abstract, and throughout the paper), I'd suggest to instead say that devices have low-bandwidth, high-latency uplinks (or more generally, communication-constrained uplink channels). In practice, devices will usually only participate in a federated computation if they are plugged into a reliable power source, so power isn't the issue.}

Most applications of federated data analysis involve two major challenges --  privacy and compression. Since typical use-cases involve personal data from users, it is important to maintain their privacy. This is usually achieved by applying a local differentially private (LDP) algorithm~\citep{duchi2013local, kasiviswanathan2011can} on the raw inputs at the client device so that only sanitized data is transmitted to the server. Additionally, since the clients frequently have low-bandwidth high-latency uplinks, it is also important to ensure that they communicate as few bits to the server as possible. Most prior work in this area~\citep{girgis2021shuffled, kairouz2021distributed, agarwal2021skellam} addressed these two challenges separately -- first, a standard LDP algorithm is used to sanitize the client responses, and then standard compression procedures are used to compress them before transmission. However, this leads to a loss in accuracy of the client responses, ultimately leading to a loss in estimation or learning accuracy at the server. Moreover, each of these methods requires a very specific communication budget and is not readily adapted to other budgets.

In this work, we take a closer look at the problem and propose designing the privacy mechanism in conjunction with the compression procedure. To this end, we propose a formal property called {\em{asymptotic consistency}} that any private federated data analysis mechanism should possess. Asymptotic consistency requires that the aggregate statistics computed by the server converge to the non-private aggregate statistics as the number of clients grows. If the server averages the client responses, then a sufficient condition for asymptotic consistency is that the clients send an unbiased estimate of their input. Perhaps surprisingly, many existing mechanisms are not unbiased, and thus not asymptotically consistent.

We first consider designing such unbiased mechanisms that, given any communication budget $b$, transmit a continuous scalar value that lies in the interval $[0, 1]$ with local differential privacy and no public randomness. We observe that many existing methods, such as truncated Gaussian, lead to biased solutions and asymptotically inconsistent outcomes if the inputs lie close to an end-point of the truncation interval. Motivated by this, we show how to convert two existing local differentially private mechanisms for transmitting categorical values -- bit-wise randomized response~\citep{warner1965randomized} and generalized randomized response -- to unbiased solutions.

We then propose a novel mechanism, the Minimum Variance Unbiased (MVU) mechanism, that given $b$ bits of communication, exploits the ordinal nature of the inputs to provide a better privacy-accuracy trade-off. We show that if the input is drawn uniformly from the set $\{0, 1/(2^b - 1), \ldots, 1 \}$, then the MVU mechanism has minimum variance among all mechanisms that satisfy the local differential privacy constraints.  We show how to adapt this mechanism to metric differential privacy~\citep{andres2013geo} for location privacy applications. To adapt it to differentially private SGD (DP-SGD; ~\citet{abadi2016deep}), we then show how to extend it to vectors within an $L_p$-ball, and establish tight privacy composition guarantees. 

Finally, we investigate the empirical performance of the MVU mechanism in two concrete use-cases: distributed mean estimation and private federated learning. In each case, we compare our method with several existing baselines, and show that our mechanism can achieve better utility for the same privacy guarantees. In particular, we show that the MVU mechanism can match the performance of specially-designed gradient compression schemes such as stochastic signSGD~\citep{jin2020stochastic} for DP-SGD training of neural networks at the same communication budget.

%In the latter case, we use our mechanism to do DP-SGD, and show that we can get significantly better privacy-utility trade-offs than regular DP-SGD in the high-privacy regime.  \kc{a couple of sentences lifted from the experiments section}

%\mike{Either in the intro, or in the conclusion/discussion, we could point out that the proposed mechanism is directly compatible with existing secure aggregation mechanisms, which are also commonly used in federated computations. All secure aggregation mechanisms (at least, all that I'm aware of, maybe Chuan can confirm?) perform computations over a finite field, and so first require conversion to a fixed-point representation. Other compression schemes in the literature, such as those involving sparsification (top-k or random k), are not as directly amenable to use in conjunction with secure aggregation.}


