\section{Related Work}
\label{sec:relwork}


%\mike{Adding miscellaneous papers here for now, can organize and condense writing later.}

%The operation of quantizing a value $x \mapsto q$  can always be viewed as adding noise to $x$, but in general the distribution of the errors $x - q$ is complicated and depends on $x$. However, quantization via subtractive dithering~\cite{Gray1993dithered} has the remarkable property that the errors are independent of the input $x$, and the distribution of the errors can be precisely characterized. %\citet{Cormode2021bit} consider a similar problem of compression and asymptotic consistency in federated analytics.
%\kc{a plausible story}

Federated data analysis with local DP is now a standard solution for analyzing sensitive data held by many user devices. A body of work~\citep{erlingsson2014rappor, kairouz2016discrete, acharya2019hadamard} provides methods for analytics over categorical data. The main methods here are Randomized Response~\citep{warner1965randomized}, RAPPOR~\citep{erlingsson2014rappor} and the Hadamard Mechanism~\citep{acharya2019hadamard}. \citet{chen2020breaking} shows that the Hadamard Mechanism uses near-optimal communication for categorical data. 

% The problem is more complex for real-valued data. For learning, the standard approach is for each device to send the gradient of the model computed over its locally held data plus suitably calibrated zero-mean Gaussian noise for privacy~\citep{abadi2016deep}. The noisy gradients are aggregated at the server, sometimes using secure aggregation~\citep{konevcny2016federated}. This method is private and asymptotically consistent when the noisy gradients are sent at full precision. However, asymptotic consistency fails if the noisy gradients are truncated. Some works, such as the Discrete Gaussian Mechanism~\citep{kairouz2021distributed} and the Skellam Mechanism~\citep{agarwal2021skellam} have considered discretization of gradients primarily for secure aggregation. These methods, however, require a communication budget that is high enough that the result is close to being unbiased.

In work on federated statistics or learning for real-valued data, \cite{Cormode2021bit} provides asymptotically consistent algorithms for transmitting scalars.  %They focus on the extreme compression setting in the scalar case, where each client sends a single bit. \mike{They describe variants of the approach where clients send multiple bits too.}
They propose to first sample one or a subset of indices of bits in the fixed-point representation of the input, and then apply randomized response independently to each of these bits. \cite{girgis2020shuffled} provides mechanisms for distributed mean estimation from vectors inside unit $L_p$ balls. Unlike our method, which provides a near-optimal solution under any given communication budget, their methods use specific communication budgets and are not readily generalizable to any budget $b$. Finally, \citet{Amiri2021compressive} propose to obtain a quantized DP mechanism by composing subtractive dithering with the Gaussian mechanism, and doing privacy accounting that factors in both. In contrast, we simply use (non-subtractive) dithering to initially obtain a fixed-point representation, and then design a mechanism to  quantize and provide DP.



%Discrete Gaussian and Skellam. (Not unbiased.) Adds enough communication to cover standard deviation. (Requires a high communication budget, enough that the result is close to being unbiased.)

%A body of prior work has looked at the impact of compression in FL. 
%--> \cite{kamath2019} considers categorical data and shows that the Hadamard Mechanism is near-optimal in terms of communication
%--> Cormode's mean estimation using an adaptive algorithm. Modified adaptive form on unbiased RR, currently applies only to scalars. 
%--> Shuffle paper w/ Girgis CLDP. We compare with them. Their solution applies to $\log d$ bits, unclear how to adapt them to any fixed communication budget $b$. 

A large body of work focuses on federated optimization methods with compressed communication~\citep{konevcny2016federated,horvath2019stochastic,das2020faster,haddadpour2021federated,gorbunov21marina}. While most propose biased compression methods (e.g., top-$k$ sparsification), such approaches require the use of error feedback to avoid compounding errors~\citep{seide2014one,stich2020error}. However, error feedback is inherently incompatible with DP~\citep{jin2020stochastic}, unlike our MVU mechanism. 