\section{Scalar Mechanisms}
\label{sec:scalar}

We consider Problem~\ref{prob:fl} when the input $x_i$ is a scalar in the interval $[0, 1]$, and the statistic\footnote{To simplify notation, we drop the subscript $n$ from statistics $\calT_n$ and aggregation functions $\calA_n$, when the number of clients $n$ is clear from the context.} $\calT$ is the average $\frac{1}{n}\sum_{i=1}^{n} x_i$. Our server side aggregation protocol will also output an average of the client responses. Our goal now is to design a client-side mechanism $\calM$ that is $\epsilon$-local DP, unbiased, and can be encoded in $b$ bits.

\paragraph{Notation.} The inputs to our client-side mechanism $\calM$ are: a continuous value $x \in [0, 1]$, a privacy parameter $\epsilon$ and a communication budget $b$. The output is a number $i \in \{0, \ldots, B - 1 \}$ where $B = 2^b$, represented as a sequence of $b$ bits. Additionally, we have an alphabet $A = \{ a_0, \ldots, a_{B - 1}\}$ shared between the clients and server; a number $i$ transmitted by a client is decoded as the letter $a_i$ in $A$. The purpose of $A$ is to ensure unbiasedness.

\subsection{Strategy Overview}

\begin{algorithm}[t]
\caption{Strategy for privacy-aware compression}
\label{alg:overview}
\begin{algorithmic}[1]
\STATE {\bf{Input:}} $x \in [0, 1]$, privacy budget $\epsilon$, communication budget $b = \bout$, input bit-width $\bin$.
\STATE{\bf{Offline phase:}}
\STATE Let $\Bout = 2^b, \Bin = 2^{\bin}$.
\STATE Construct sampling probability matrix $P \in \mathbb{R}^{\Bin \times \Bout}$ and output alphabet $A = \{a_0,\ldots,a_{\Bout-1}\}$ to satisfy $\epsilon$-DP and unbiasedness constraints.
\STATE{\bf{Online phase:}}
\STATE $i = (\Bin-1) \cdot \Dither(x) \in \{0,1,\ldots,\Bin-1\}$.
\STATE Draw $j \in \{0,\ldots,\Bout-1\}$ from the categorical distribution defined by probability vector $P_i$.
\STATE {\textbf{Return}} $a_j$.
\end{algorithmic}
\end{algorithm}

Our privacy-aware compression mechanism operates in two phases.
In the offline phase, it selects an input bit-width value $\bin$ and pre-computes an output alphabet $A$ and a sampling probability matrix $P \in \mathbb{R}^{\Bin \times \Bout}$, where $\Bout = 2^b, \Bin = 2^{\bin}$. Both $P$ and $A$ are shared with the server and all clients. In the online phase, the client-side mechanism $\calM$ first uses dithering to round an input $x \in [0,1]$ to the grid $\{0, \frac{1}{\Bin-1}, \ldots, 1\}$ while maintaining unbiasedness, and then draws an index $j$ from the categorical distribution defined by the probability vector $P_i$, where $i = (\Bin-1) \cdot \Dither(x)$. The client then sends $a_j$ to the server. Algorithm \ref{alg:overview} summarizes the procedure in pseudo-code. Note that the strategy generalizes to any bounded input range by scaling $x$ appropriately.

%The first phase uses dithering to round $x$ into the grid $\{0, \frac{1}{\Bin-1}, \ldots, 1\}$ while maintaining unbiasedness. In the second phase, mechanism uses a pre-determined sampling probability matrix $P$ to 

In order for $\calM$ to satisfy $\epsilon$-DP and unbiasedness, we must impose the following constraints for the sampling probability matrix $P = [p_{i,j}]$ and output alphabet $A = \{a_j\}_{j=0}^{\Bout-1}$:
\vspace{-2ex}
\begin{subequations} \label{eq:constraints}
\begin{align}
    \text{Row-stochasticity:} & \quad \sum_{j=0}^{\Bout-1} p_{i,j} = 1 \quad \forall i \label{eq:row-stochastic} \\
    \text{Non-negativity:} & \quad p_{i,j} \ge 0 \quad \forall i,j \label{eq:non-negative} \\
    \text{$\epsilon$-DP:} & \quad p_{i',j} e^{-\epsilon} \le p_{i,j} \le p_{i', j} e^\epsilon \quad \forall i \ne i' \label{eq:dp-constraints} \\
    \text{Unbiasedness:} & \quad \sum_{j=0}^{\Bout-1} a_j p_{i,j} = \frac{i}{\Bin-1} \quad \forall i. \label{eq:unbiased}
\end{align}
\end{subequations}
Conditions \eqref{eq:row-stochastic} and \eqref{eq:non-negative} ensure that $P$ is a probability matrix. Condition \eqref{eq:dp-constraints} ensures $\epsilon$-DP, while condition \eqref{eq:unbiased} ensures unbiasedness. Note that these constraints only define the feasibility conditions for $P$ and $A$, and hence form the basis for a broad class of private mechanisms.
In the following sections, we show that two variants of an existing local DP mechanism Randomized Response~\cite{warner1965randomized} -- bit-wise Randomized Response and generalized Randomized Response -- can be realized as special cases of this family of mechanisms. %We call the modified algorithms Unbiased Multiple Randomized Response and Unbiased RAPPOR.

%\kc{Move this earlier to Sec 2.4?}
%The dithering procedure works as follows. If $x \in [\frac{k}{B-1}, \frac{k+1}{B-1}]$ where $0 \leq k \leq B - 2$, then, we select $Z = \frac{k}{B-1}$ with probability $(B-1)(x - \frac{k}{B-1})$, and $Z = \frac{k+1}{B-1}$ with probability $1 - (B-1)(x - \frac{k}{B-1})$. It is not too hard to see that this ensures that $\bbE[Z] = x$. Hence-forward, for the second phase of each algorithm, we will assume that the input lies in the set $\{0, \frac{1}{B-1}, \ldots, 1\}$.

\subsection{Unbiased Bitwise Randomized Response}

Randomized Response (RR)~\citep{warner1965randomized} is one of the simplest LDP mechanisms that sanitizes a single bit. Given a bit $y \in \{ 0, 1\}$, the RR mechanism outputs the $y$ with some probability $p$ and the flipped bit $1 - y$ with probability $1 - p$. If $p = \frac{1}{1 + e^{-\epsilon}}$, then the mechanism is $\epsilon$-local DP.

\smallskip\noindent\textbf{Unbiased Bitwise Randomized Response Mechanism.} The RR mechanism does not directly apply to our task as it is biased and applies to one bit. We obtain unbiasedness by using the output alphabet $A = \{ - \frac{1}{e^{\epsilon} - 1}, \frac{e^{\epsilon}}{e^{\epsilon} - 1} \}$, and repeat the one-bit mechanism $b$ times on each bit of $x$, with a privacy budget of $\epsilon/b$ each time. 
It is not hard to see that unbiased RR with $b=1$ is a special case of Algorithm \ref{alg:overview}. For $b>1$, we can construct the resulting probability matrix $P$ by applying unbiased RR to each bit independently and similarly obtain the resulting output alphabet $A$.

We prove in Appendix \ref{sec:proofs} that Unbiased Bitwise Multiple RR satisfies $\epsilon$-local DP and is unbiased.
%additionally, it is unbiased, thus ensuring asymptotic consistency when used along with a server that averages the client responses.

\begin{comment}
\begin{algorithm}[t]
\caption{Unbiased Bitwise Randomized Response}
\label{alg:umrr}

\begin{algorithmic}[1]
\STATE {\bf{Input:}} $x \in [0, 1]$, privacy budget $\epsilon$, communication budget $b$.
\STATE Let $B = 2^b$.
\STATE $z = \Dither(x)$.
\FOR{$j = 1, \ldots, b$}
	\STATE $z_j$ be bit $j$ of $(B-1)z$.
	\STATE Set $y_j = z_j$ with probability $ \frac{1}{1 + e^{-\epsilon/b}}$, $y_j = 1 - z_j$ otherwise.
	\STATE Set $t_j = a_0 + y_j (a_1 - a_0)$ where  $a_0 =  - \frac{1}{e^{\epsilon/b} - 1}$ and $a_1 =  \frac{e^{\epsilon/b}}{e^{\epsilon/b} - 1}$.
\ENDFOR
\STATE {\textbf{Return}} $(t_1, t_2, \ldots, t_b)$.
\end{algorithmic}
\end{algorithm}
\end{comment}

%\begin{theorem}\label{thm:umrr}
%Unbiased Multiple Randomized Response satisfies $\epsilon$-local DP and is unbiased.
%\end{theorem}

%Here, UnbiasedRR is the following unbiased version of the Randomized Response mechanism.

%\begin{enumerate}
%\item {\bf{Inputs:}} A bit $y \in {0, 1}$. Privacy parameter $\epsilon$.
%\item \begin{eqnarray*}
%z & = y, & \text{with probability} \frac{1}{1 + e^{-\epsilon}} \\
%& = 1 - y, & \text{otherwise}
%\end{eqnarray*}
%\item Output $a_0 + z (a_1 - a_0)$ where $a_0 =  - \frac{1}{e^{\epsilon} - 1}$ and $a_1 =  \frac{e^{\epsilon}}{e^{\epsilon} - 1}$.
%\end{enumerate}

\subsection{Unbiased Generalized Randomized Response}

Generalized Randomized Response is a simple generalization of the one-bit RR mechanism for sanitizing a categorical value $x \in \{1, \ldots, K\}$. The mechanism transmits $x$ with some probability $p$, and a draw from a uniform distribution over $\{1, \ldots, K\}$ with probability $1 - p$. The mechanism satisfies $\epsilon$-local DP when $p = \frac{e^{\epsilon} - 1}{K + e^{\epsilon} - 1}$.

\smallskip\noindent\textbf{Unbiased Generalized Randomized Response.} We can adapt Generalized RR to our task by dithering the input $x$ to the grid $\{ 0, \frac{1}{\Bout-1}, \ldots, 1\}$ where $\Bout = 2^{\bout}$, and then transmitting the result using Generalized RR. Alternatively, we can derive the sampling probability matrix $P = \frac{e^{\epsilon} - 1}{\Bout + e^{\epsilon} - 1} I_{\Bout} + \frac{1}{\Bout + e^{\epsilon} - 1}$, where $I_{\Bout}$ is the identity matrix.
However, this leads to a biased output. To address this, we change the alphabet to $A = \{ a_0, a_1, \ldots, a_{\Bout-1}\}$ such that unbiasedness is maintained.
Specifically, for any $i \in \{ 0, \ldots, \Bout - 1 \}$, we need to ensure that when the input is $\frac{i}{\Bout-1}$, the expected output is also $\frac{i}{\Bout-1}$, which reduces to the following equation:
\begin{equation*} \label{eqn:airappor}
 a_i \cdot \frac{e^{\epsilon} - 1}{\Bout + e^{\epsilon} - 1} + \sum_{j = 0}^{\Bout-1} a_j \cdot \frac{1}{ \Bout + e^{\epsilon} - 1} = \frac{i}{\Bout-1}.
\end{equation*}
Writing this down for each $i$ gives $\Bout$ linear equations, solving which will give us the values of $a_0, \ldots, a_{\Bout-1}$. We establish the privacy and unbiasedness properties of Unbiased Generalized RR in Appendix \ref{sec:proofs}. A similar unbiased adaptation was also considered by \cite{balle2019privacy}. %The complete algorithm is shown in Algorithm~\ref{alg:rappor}. Theorem~\ref{thm:rappor} establishes its privacy and unbiasedness properties.

\begin{comment}
\begin{algorithm}
\caption{Unbiased Generalized RR}
	\label{alg:rappor}
\begin{algorithmic}[1]
\STATE {\bf{Inputs:}} $x \in [0, 1]$, privacy budget $\epsilon$, communication budget $b$.
\STATE $z = \Dither(x)$.
\STATE Calculate $a_0, \ldots, a_{B-1}$ by solving Equation~\eqref{eqn:airappor}.
\STATE Draw $z'$ from a mixture of $\delta_{(B-1)z}$ and $Unif\{0, 1, \ldots, B - 1\}$ with mixing weights $\frac{e^{\epsilon} - 1}{B + e^{\epsilon} - 1}$ and $\frac{B}{B + e^{\epsilon} - 1}$.
\STATE {\bf{Return}}  $z'$.
\end{algorithmic}
\end{algorithm}
\end{comment}

%\begin{theorem}\label{thm:rappor}
%Algorithm~\ref{alg:rappor} satisfies $\epsilon$-local DP and is unbiased.
%\end{theorem}

\begin{figure*}[t]
\centering
\includegraphics[width=\linewidth]{UAI/figures/P_and_alpha_samples.pdf}
\caption{Optimized sampling probability matrix $P$ (top row) and output alphabet $A = \{a_0,\ldots,a_{\Bout-1}\}$ (bottom row) of the MVU mechanism with $\bin = \bout = 3$ for $\epsilon=1,3,5,10$. At $\epsilon=1$, the DP constraint forces entries in each column to be similar, and the unbiasedness constraint causes the magnitude of $a_j$ to be large. At $\epsilon=10$, the weaker DP constraint allows the optimal $P$ matrix to become close to the identity matrix and $a_j \approx j/(B-1)$.}
\label{fig:p_and_alpha_samples}
\end{figure*}

\subsection{The MVU Mechanism}

A challenge with Unbiased Bitwise RR and Unbiased Generalized RR is that both algorithms are not intrinsically designed for ordinal or numerical values, which may result in poor accuracy upon aggregation. We next propose a new method that improves estimation accuracy by reducing the variance of each client's output while retaining unbiasedness and hence asymptotic consistency.

Our proposed method -- the \emph{Minimum Variance Unbiased} (MVU) mechanism --  addresses this problem by directly minimizing the variance of the client's output. This is done by solving the following optimization problem:
%The mechanism involves two steps. First we dither the input $x \in [0,1]$ to a value $z$ in the grid $\{0, \frac{1}{\Bin -1}, \dots, 1\}$. Then we randomly map $z$ to one of the values in the alphabet $A = \{a_0, a_1, \dots, a_{\Bout-1}\}$, where the values $a_j$ are to be determined. If $z = i / (\Bin-1)$, then it outputs $a_j$ with probability $p_{i, j}$. This gives us $(\Bin + 1) \Bout$ variables: the $a_j$s and the $p_{i, j}$s. We suppose that the initial dithering uses $\bin$ bits, and $\Bin = 2^{\bin}$. Similarly, the output uses $\bout$ bits and $\Bout = 2^{\bout}$.
%Given $\bin$, $\bout$, and $\epsilon$, we design the values $a_i$ and probabilities $p_{i,j}$ by solving the following optimization problem:
\begin{align} \label{eq:mvu_problem}
    \min_{\substack{p \in [0,1]^{\Bin \times \Bout} \\ a \in \mathbb{R}^{\Bout}}} & \quad \sum_{i=0}^{\Bin-1} \sum_{j=0}^{\Bout-1} p_{i,j} \left(\frac{i}{\Bin-1} - a_j\right)^2 \\
    \text{subject to} & \quad \text{Conditions } \eqref{eq:row-stochastic}-\eqref{eq:unbiased}. \nonumber
\end{align}
The objective in \eqref{eq:mvu_problem} measures the variance of the output of the mechanism when the input $i$ is uniformly distributed over the set $\{0, \frac{1}{\Bin-1}, \dots, 1\}$. Conditions \eqref{eq:row-stochastic}-\eqref{eq:unbiased} ensure that the MVU mechanism is $\epsilon$-DP and unbiased, hence satisfying requirements for our task.
%The constraints~\eqref{eq:row-stochastic} and~\eqref{eq:non-negative} ensure that the parameters $(p_{i,j})$ define a valid probability distribution. Constraint~\eqref{eq:dp-constraints} ensure the resulting mechanism provides $\epsilon$-DP, and~\eqref{eq:unbiased} ensures the mechanism is unbiased. The final algorithm is stated in Algorithm~\ref{alg:mvu}, and
%\mike{Thanks for adding that figure! There probably isn't space in the main paper, but how about adding a similar figure in the appendix with fixed $\epsilon$ and varying $\bin$ and $\bout$? That could also be used to support the claim that for small $\epsilon$ there's no benefit to using many bits.}

% Probability constraints ensure that for all $i$, we have:
% \begin{equation} \label{eqn:qpprob}
% 	\sum_{j=0}^{B-1} p_{i, j} = 1, \quad p_{i, j} \geq 0
% \end{equation}
% Additionally, differential privacy requires that for all $i \neq i'$ and for all $j$,
% \begin{equation} \label{eqn:qpdp}
%  p_{ i', j} e^{-\epsilon} \leq p_{i, j} \leq p_{i', j} e^{\epsilon}
% \end{equation}
% Finally, we require that the output is unbiased; in other words, when $z = \frac{i}{B-1}$, the expected value of the output is also $\frac{i}{B-1}$. This constraint can be encoded as the following equation that applies to every $i \in \{ 0, \ldots, B-1\}$.
% \begin{equation} \label{eqn:qpexp}
%  \sum_{j=0}^{B-1} a_j p_{i, j} = \frac{i}{B-1}
% \end{equation}
% Any set of $a_j$s and $p_{i, j}$s that satisfy these constraints represent a feasible mechanism. To obtain the best one out of the feasible set, we propose to minimize the variance, when the input $i$ is drawn uniformly from the grid $\{0, 1/(B-1), \ldots, 1 \}$. This variance can be written as follows:
% \begin{equation} \label{eqn:qpvar}
% \sum_{i=0}^{B-1} \sum_{j=0}^{B-1} p_{i, j} \left( \frac{i}{B-1} - a_j \right)^2
% \end{equation}
% The final algorithm is stated in Algorithm~\ref{alg:qp}. Observe that by construction, it is $\epsilon$-local DP, and is unbiased. Additionally, if the input lies on the grid $\{0, 1/(B-1), \ldots, 1\}$ and we can solve the optimization problem optimally, the mechanism by construction has optimal variance.

% The client mechanism will output a letter from the alphabet $A = \{ a_0, a_1, \ldots, a_{B-1} \}$ where the values of $a_i$ are to be determined. Additionally, if $z = i / (B-1)$, then it outputs $a_j$ with probability $p_{i, j}$. This gives us $B^2 + B$ variables: the $a_j$s and the $p_{i, j}$s. Probability constraints ensure that for all $i$, we have:
% \begin{equation} \label{eqn:qpprob}
% 	\sum_{j=0}^{B-1} p_{i, j} = 1, \quad p_{i, j} \geq 0
% \end{equation}
% Additionally, differential privacy requires that for all $i \neq i'$ and for all $j$,
% \begin{equation} \label{eqn:qpdp}
%  p_{ i', j} e^{-\epsilon} \leq p_{i, j} \leq p_{i', j} e^{\epsilon}
% \end{equation}
% Finally, we require that the output is unbiased; in other words, when $z = \frac{i}{B-1}$, the expected value of the output is also $\frac{i}{B-1}$. This constraint can be encoded as the following equation that applies to every $i \in \{ 0, \ldots, B-1\}$.
% \begin{equation} \label{eqn:qpexp}
%  \sum_{j=0}^{B-1} a_j p_{i, j} = \frac{i}{B-1}
% \end{equation}
% Any set of $a_j$s and $p_{i, j}$s that satisfy these constraints represent a feasible mechanism. To obtain the best one out of the feasible set, we propose to minimize the variance, when the input $i$ is drawn uniformly from the grid $\{0, 1/(B-1), \ldots, 1 \}$. This variance can be written as follows:
% \begin{equation} \label{eqn:qpvar}
% \sum_{i=0}^{B-1} \sum_{j=0}^{B-1} p_{i, j} \left( \frac{i}{B-1} - a_j \right)^2
% \end{equation}
% The final algorithm is stated in Algorithm~\ref{alg:qp}. Observe that by construction, it is $\epsilon$-local DP, and is unbiased. Additionally, if the input lies on the grid $\{0, 1/(B-1), \ldots, 1\}$ and we can solve the optimization problem optimally, the mechanism by construction has optimal variance.

% \mike{We could also write the formulation a little more generally, where the number of lattice points at the input is different from the number at the output. Would that be useful?}

\begin{comment}
\begin{algorithm}[t]
\caption{The MVU Mechanism \label{alg:mvu}}
	\begin{algorithmic}[1]
\STATE {\bf{Inputs:}} $x \in [0, 1]$, privacy budget $\epsilon$, dithering budget $\bin$ and communication budget $\bout$.
\STATE Let $\Bin = 2^{\bin}$ and $\Bout = 2^{\bout}$.
\STATE $z = \Dither(x)$.
% \STATE Solve the optimization problem with objective~\eqref{eqn:qpvar} and constraints~\eqref{eqn:qpprob},\eqref{eqn:qpdp} and~\eqref{eqn:qpexp} to calculate $a_0, \ldots, a_{B-1}$ and probabilities $p_{i, j}$.
\STATE Solve the optimization problem~\eqref{eq:mvu_problem} to obtain $a_0, \ldots, a_{\Bout-1}$ and probabilities $p_{i, j}$.
\STATE If $z = \frac{i}{\Bin-1}$ then set $z' = a_j$ with probability $p_{i, j}$.
\STATE {\bf{Return}} $z'$.
	\end{algorithmic}
\end{algorithm}
\end{comment}

\smallskip\noindent\textbf{Solving the MVU mechanism design problem.} We solve \eqref{eq:mvu_problem} using one of two approaches depending on size of the probability matrix $P$ and $\epsilon$. For smaller problems and when $\epsilon$ is not too small, we use a trust region interior-point solver~\citep{conn2000trust}. As $\epsilon$ approaches $0$, the problem becomes poorly conditioned and we only approximately solve the problem by relaxing the unbiasedness constraint~\eqref{eq:unbiased}. In this case we use an alternating minimization heuristic where we alternate between fixing the values $a_j$ and solving for $p_{i,j}$, and holding $p_{i,j}$ fixed and solving for $a_j$, while incorporating constraint~\eqref{eq:unbiased} as a soft penalty in the objective. Each of the corresponding subproblems is a quadratic program and can be solved efficiently. Figure~\ref{fig:p_and_alpha_samples} shows examples of the MVU mechanism for $\bin = \bout = 3$ and $\epsilon \in \{1, 3, 5, 10\}$ obtained using the trust region solver. 

\smallskip\noindent\textbf{Relationship between DP and compression.} The MVU mechanism highlights an intriguing connection between DP and compression. Since the mechanism hides information in the input $x$ by perturbing it with random noise, as $\epsilon \rightarrow 0$, fewer bits are required to describe the noisy output $\calM(x)$. In the limiting case of $\epsilon=0$, all information is lost and the output can be described by zero bits. In Appendix \ref{sec:experiment_details}, we demonstrate this argument concretely by showing that as $\epsilon \rightarrow 0$, the marginal benefit of having a larger communication budget decreases. %\cg{Is this worth including if we don't have space for any concrete results?}

%\paragraph{Metric DP.} The metric DP constraint is a slight modification of Equation \ref{eqn:qpdp}:
%\begin{equation} \label{eqn:qpdp_metric}
% p_{j, i'} e^{-\epsilon |i-i'| / (B-1)} \leq p_{j, i} \leq p_{j, i'} e^{\epsilon |i-i'| / (B-1)},
%\end{equation}
%where the level of privacy protection varies depending on the distance in input space. Alternatively, we can use a quadratic privacy penalty instead of the absolute penalty in Equation \ref{eqn:qpdp_metric}:
%\begin{equation} \label{eqn:qpdp_qmetric}
% p_{j, i'} e^{-\epsilon (i-i')^2 / (B-1)^2} \leq p_{j, i} \leq p_{j, i'} e^{\epsilon (i-i')^2 / (B-1)^2}.
%\end{equation}
%Doing so allows us to handle vector-valued inputs at a given $L_2$-sensitivity.
%Suppose that $\bx^{(0)}, \bx^{(1)} \in \{0,\frac{1}{B-1},\ldots,1\}^d$ with $\| \bx^{(0)} - \bx^{(1)} \|_2 \leq \Delta$ and let $\bz$ be the randomized output according to the optimal mechanism with (quadratic) metric DP condition (Equation \ref{eqn:qpdp_qmetric}). Then:
%\begin{align*}
%    \frac{\mathbb{P}(\bz | \bx^{(1)})}{\mathbb{P}(\bz | \bx^{(0)})} &= \prod_{i=1}^d \frac{\mathbb{P}(\bz_i | \bx^{(1)})}{\mathbb{P}(\bz_i | \bx^{(0)})} \\
%    &\leq \prod_{i=1}^d \exp(\epsilon (\bx^{(0)}_i - \bx^{(1)}_i)^2) \\
%    &= \exp \left( \epsilon \sum_{i=1}^d (\bx^{(0)}_i - \bx^{(1)}_i)^2 \right) \leq \exp(\epsilon \Delta^2),
%\end{align*}
%and similarly for the lower bound $\exp(-\epsilon \Delta^2) \leq \mathbb{P}(\bz | \bx^{(1)}) / \mathbb{P}(\bz | \bx^{(0)})$. Thus if the optimal mechanism is (quadratic) metric $\epsilon$-DP with respect to each coordinate of the vector $\bx$ then it is $\epsilon \Delta^2$-DP with respect to the vector $\bx$ at $L_2$-sensitivity $\Delta$.



\section{Extensions}
\label{sec:extension}

%Many practical applications of federated data analysis go beyond simple statistics on scalars to more complex aggregate statistics metrics \cg{What does this mean?} and vectors. 
We now show how to extend the MVU mechanism to obtain privacy-aware and accurate compression mechanisms for metric-DP and vector spaces.

\subsection{Metric DP}

%An important application of federated data analysis is
In location privacy, client devices send their obscured locations to a central server for aggregation. Metric DP (Definition~\ref{def:metricdp}) is a variation of LDP that applies to this use-case. We are given a position $x$ and a metric $d$ which measures how far apart two positions are. Our goal is to output a private position $x'$ so that fine-grained properties of $x$ (such as, exact address, city block) are hidden, while coarse-grained properties (such as, city, or zip-code) are preserved. %Here, distance is measured according to the input metric $d$.

We show how to adapt the MVU mechanism to metric DP. For simplicity, suppose that we measure position on the line, so $x \in [0, 1]$.  We modify Condition~\eqref{eq:dp-constraints} to instead satisfy the metric DP constraint with respect to the metric $d$:
%and we seek to output an $\epsilon$-metric DP version of $x$ for the metric $d$.
%As before, we can dither $x$ to the grid $\{ 0, 1/(B-1), \ldots, 1\}$, and then probabilistically map the dithered result $z$ to a letter in the alphabet $A$. The only modification is in the privacy constraints -- which change from local DP to metric DP; these can be written as:
\begin{equation} \label{eqn:qpmetricdp}
	p_{i', j} e^{-\epsilon d(i/(\Bin-1), i'/(\Bin-1))} \leq p_{ i, j} \leq p_{ i', j} e^{\epsilon d( i/(\Bin-1), i'/(\Bin-1))}.
\end{equation}
Thus we can get an MVU mechanism for metric DP by solving the modified optimization problem in \eqref{eq:mvu_problem} and following the same procedure in Algorithm~\ref{alg:overview}.

\subsection{Extension to Vector Spaces}

We next look at extending the MVU mechanism to vector spaces. Specifically, a client now holds a $d$-dimensional vector $\bx$ in a domain $\calX \subseteq \mathbb{R}^d$, and its goal is to output an $\epsilon$-local DP version that can be communicated in $bd$ bits. The domain $\calX$ is typically a unit $L_p$-norm ball for $p \geq 1$.

%A second important application of federated statistics is in federated learning, where the central server aims to iteratively learn a machine learning model based on sensitive data held by the clients. At each iteration, client devices send a private gradient vector that is aggregated at the server to update the learned model. Thus to adapt our privacy-aware compression methods for federated learning, we need to extend them to vectors.
%an important question is how to extend these privacy-aware compression methods to vectors.
%Specifically, the problem setting is as follows. 

A plausible approach is to apply the scalar MVU mechanism independently for each coordinate of $\bx$. While this will provide the optimal accuracy for $p = \infty$, for $p < \infty$, the client's variance will be higher. A second approach is to extend the MVU mechanism directly to $\calX$ by using an alphabet $A \times A \times \ldots \times A = A^d$ and then solving the corresponding optimization problem~\eqref{eq:mvu_problem}. Unfortunately this is computationally intractable even for moderate $d$.

%\cg{Is it a variance problem or a privacy bound problem?}

Instead, we show how to obtain a more computationally tractable approximation when $\calX$ is an $L_p$-ball. We are motivated by the following lemma.

%\kc{need to adjust the constants here}

\begin{lemma} \label{lem:metrictovector}
Let $\calX$ be the unit $L_p$-ball with diameter $\Delta$. Suppose $\calM$ is an $\epsilon$-metric DP scalar mechanism with $d(y, y') = |y - y'|^p$. Then, the mechanism $\calM_d: \calX \rightarrow \mathbb{R}^d$ that maps $\bx$ to the vector $(\calM(\bx_1), \ldots, \calM(\bx_d))$ is $\epsilon \Delta^p$-local DP. Additionally, if $\calM$ is unbiased, then $\calM_d$ is unbiased as well.
\end{lemma}

Lemma \ref{lem:metrictovector} suggests the following algorithm: Use the MVU mechanism for $\epsilon$-metric DP with $d(y, y') = |y - y'|^p$ for each coordinate, then combine to get an $\epsilon$-local DP solution for vectors with $L_p$-sensitivity $\Delta$. Since $\| \cdot \|_\infty \leq \| \cdot \|_p$, each coordinate of $\bx$ lies in a bounded range $[-\Delta,\Delta]$, so we can scale $\bx$ by $\bx' \leftarrow (\bx + \Delta) / 2\Delta$ so that all entries belong to $[0,1]$ and the MVU mechanism can be applied to $\bx'$. Note that this scaling operation changes the $L_p$-sensitivity to $1/2$.

This solution is computationally tractable since we only need to solve an optimization problem for the scalar MVU mechanism -- so involving $\approx \Bout^2 = 2^{2\bout} $ variables and constraints (instead of $\approx 2^{2\bout d}$). We investigate how this mechanism works in practice in Section~\ref{sec:experiments}.

\subsection{Composition using R\'{e}nyi-DP}
\label{sec:composition}

%Repeated applications of MVU mechanisms follows the usual privacy accounting for composition. However, \cite{mironov2017renyi} showed that this method of composition tends to over-estimate privacy leakage, and instead advocated for privacy accounting and composition using R\'{e}nyi differential privacy (RDP), which can be converted back to the usual $(\epsilon,\delta)$-DP guarantee. 

Repeated applications of the MVU mechanism will give an additive sequential privacy composition guarantee as in standard $\epsilon$-DP. We next show how to get tighter composition bounds for the MVU mechanism using RDP accounting as in~\cite{mironov2017renyi}.
%for any R\'{e}nyi order $\alpha > 1$.

Suppose that $\bx, \bx' \in \{0,1/(\Bin-1),\ldots,1\}^d$ are quantized $d$-dimensional vectors, and let $Q_0, Q_1$ be the output distributions of the mechanism $\calM$ for inputs $\bx, \bx'$, respectively. By the definition of R\'{e}nyi divergence~\citep{renyi1961measures},
\begin{equation*}
D_\alpha(Q_0 || Q_1)
%&= \frac{1}{\alpha-1} \log \mathbb{E}_{\bz \sim Q_1} \frac{Q_0(\bz)^\alpha}{Q_1(\bz)^\alpha} \\
%&= \frac{1}{\alpha-1} \log \prod_{l=1}^d \mathbb{E}_{\bz_l} \frac{Q_0(\bz_l)^\alpha}{Q_1(\bz_l)^\alpha} \quad \text{by independence} \\
= \frac{1}{\alpha-1} \sum_{l=1}^d \log \sum_{j=0}^{\Bin-1} \frac{p_{\bi_l,j}^\alpha}{p_{\bi_l',j}^{\alpha-1}},
\end{equation*}
where $\bi, \bi' \in \{0,1,\ldots,\Bin-1\}^d$ are such that $\bx = \bi / (\Bin-1)$ and $\bx' = \bi' / (\Bin-1)$. Let $D^\alpha$ denote the $\Bin \times \Bin$ matrix with entries $D^\alpha_{i,i'} = \frac{1}{\alpha-1} \log \sum_{j=0}^{\Bin-1} p_{i,j}^\alpha / p_{i',j}^{\alpha - 1}$. Then, computation of the $\alpha$-RDP parameter for $\calM$ can be formulated as the following combinatorial optimization problem:
\begin{equation*}
\label{eq:comb_opt}
\max_{\bi, \bi' \in \{0,1,\ldots,\Bin-1\}^d} \: \sum_{l=1}^d D^\alpha_{\bi_l, \bi_l'} 
\quad \text{s.t. } \: \| \bi - \bi' \|_p^p \leq (\Bin-1)^p \Delta^p.
\end{equation*}
This optimization problem is in fact an instance of the \emph{multiple-choice knapsack problem}~\citep{sinha1979multiple} and admits an efficient linear program relaxation by converting the integer vectors $\bi, \bi'$ to probability vectors, \emph{i.e.},
\begin{align}
\label{eq:comb_opt_lp}
\max_{\bp \in \mathbb{R}^{d \times \Bin \times \Bin}} &\quad \sum_{l=1}^d \langle D^\alpha, \bp_l \rangle_F \\
\text{subject to } &\quad \sum_{l=1}^d \langle C, \bp_l \rangle_F \leq (\Bin-1)^p \Delta^p \nonumber \\
&\quad \sum_{i,j} (\bp_l)_{ij} \leq 1 \text{ and } \bp_l \geq 0 \; \forall l, \nonumber
\end{align}
where $\langle \cdot, \cdot \rangle_F$ denotes Frobenius (vectorized) inner product and $C$ denotes the distance matrix with entries $C_{ij} = (i - j)^p$.
This LP relaxation can still be prohibitively expensive to solve for large $d$ since $\bp$ contains $d\Bin^2$ variables. Fortunately, in such cases, we can obtain an upper bound via the greedy solution; see Appendix \ref{sec:proofs} for the proof.

\begin{lemma}\label{lem:greedy}
Let $(i^*, j^*) = \argmax_{i,j} D^\alpha_{ij} / C_{ij}$ and let $d_0 = (\Bin-1)^p \Delta^p / C_{i^* j^*}$. Then \eqref{eq:comb_opt_lp} $\leq d_0 D_{i^* j^*}^\alpha$.
\end{lemma}

%\begin{observation}
%\label{obs:greedy_optimal}
%Let $(i^*, j^*) = \argmax_{i,j} D^\alpha_{ij} / C_{ij}$ and let $d_0 = (B-1)^p \Delta^p / C_{i^* j^*}$. If $d > d_0$, the greedy solution $(\bp_l)_{i^* j^*} = 1$ for $l=1,\ldots,\lfloor d_0 \rfloor$ and $(\bp_{\lfloor d_0 \rfloor + 1})_{i^* j^*} = d_0 - \lfloor d_0 \rfloor$ is optimal for Equation \ref{eq:comb_opt_lp}.
%\end{observation}

To summarize, for composition with RDP accounting at order $\alpha$, we can either solve the LP relaxation in \eqref{eq:comb_opt_lp} or compute the greedy solution to obtain an upper bound for $D_\alpha(P || Q)$, and then apply the usual composition for RDP.


\begin{figure*}[t!]
\centering
\includegraphics[width=\linewidth]{UAI/figures/dme_scalar_b1.pdf}
\caption{Distributed mean estimation for scalar data with LDP $\epsilon=1,3,5$. The MVU mechanism with budget $b=1$ recovers the CLDP mechanism and the two curves coincide, while with $b=3$ MVU attains a low variance across all input values compared to the baseline mechanisms. See text for details.%\mike{My vision is not as good as it used to be, but I have a hard time differentiating between the colors for RAPPOR and CLDP in this figure. Also, the figure can be edited to get rid of the top part (already appears in Fig 1).} \cg{Is this color scheme better?} \mike{Yes, much better for me, thanks!}
}
\label{fig:dme_scalar_comparison}
\end{figure*}
