
\section{Problem Formulation}
\label{sec:mechanism_design}
Consider a quantization mechanism $$\mathcal{M}: \mathcal{X}\rightarrow \{B_1, B_2, \cdots , B_m\}$$ used for quantizing a scalar $x \in [-c,c]:=\mathcal{X}$, where $$-c-\Delta=B_1< B_2 < \cdots < B_m=c+\Delta,$$ $m$ is the number of quantization bins, $\Delta\geq0$ extends the range of output. Note that $m$ bins here are not necessarily uniformly distributed. Our goal is to design $\mathcal{M}$ (including bin values $B_1,\cdots,B_m$ and $\Delta$) that is 1) differentially private; 2) unbiased, i.e., $\mathbb{E}(\mathcal{M}(x)) = x, \forall x$; and 3) accurate with the mean absolute error $\mathbb{E}(|\mathcal{M}(X)-X|)$ minimized. Let the capital letter $X$ denote the random variable of input and the small letter $x$ the corresponding realization. 

\subsection{Background: differential privacy}
\input{uai2024/prelim}


\subsection{Proposed quantization mechanism}\label{subsec:mechanism}
Next, we present our mechanism $\mathcal{M}$ that quantizes input with DP guarantee.
%In this section, we start with proposing a mechanism for quantizing scalar  $x \in [-c,c]:=\mathcal{X}$. Let $\linebreak\mathcal{M}: \mathcal{X}\rightarrow \{B_1, B_2, \dots , B_m\}$ be the quantization mechanism, where $m$ is the number of quantization bins, $-c-\Delta=B_1< B_2 \ldots < B_m=c+\Delta$, and $\Delta$ is a non-negative number.  It is worth noting that these bins are not necessarily uniformly distributed. As we will show in  \S~\ref{subsec:gen}, our mechanism can adapt to the non-uniform output bins to achieve better performance. 
Given a set of bins $\{B_1,B_2,\cdots, B_m\}$, $\mathcal{M}$ takes the following steps to quantize a scalar $x$:
%Let $\mathcal{M}(x)$ be the 
% the mechanism for quantizing a scalar $x \in [-c, c]$.
% To do that, The output range will be extended by $\Delta$ to $[-c-\Delta, c+\Delta]$. Our mechanism will randomly map $x$ to one of the $m$ bins, which are distributed in the output range, denoted as $\{B_1, B_2, \dots , B_m\}$, where $B_1=-c$ and $B_m=c$. 
\begin{enumerate}[leftmargin=*]
    \item For any $x$, select two bins $B_l, B_r\in \{B_1,B_2,\cdots, B_m\}$ randomly based on a pre-defined \textit{selection distribution}, with $B_l \leq x$ located on the left side of $x$ and $B_r > x$ on the right side of $x$. In other words, if $x\in [B_j,B_{j+1})$, then $l \in \{1,\ldots,j\}$ and $r \in \{j+1,\ldots,m\}$.
     %one bin located on the left side of $x$ (denoted as $B_{l}$), and another bin on the right side (denoted as $B_{r}$), according to a pre-defined \textit{selection distribution}. More precisely, if $x\in [B_{j},B_{j+1})$, then mechanism $\mathcal{M}$ picks $l \in \{1,\ldots,j\}$ and $r \in \{j+1,\ldots,m\}$ at random with a certain probability distribution. 
    \item Then, $\mathcal{M}$ randomly outputs either $B_l$ or $B_r$ according to
\begin{equation}\label{equ:dither}
            \mathcal{M}(x) = 
                \begin{cases}
                    B_l, & \text{with probability (w.p.)}~ \frac{B_r-x}{B_r-B_l}; \\
                    B_r, & \text{with probability (w.p.)}~ \frac{x-B_l}{B_r-B_l}.
                \end{cases}
        \end{equation}
\end{enumerate}
Given \eqref{equ:dither}, it is easy to verify that the mechanism $\mathcal{M}$ is unbiased, i.e., $\mathbb{E}(\mathcal{M}(x)) = x, \forall x$.  Our goal is to design \textit{selection distribution} in the first step such that the mean absolute error  $\mathbb{E}(|\mathcal{M}(X)-X|)$ is minimized. In this paper, we assume bin values $\{B_1,\cdots, B_m\}$ are symmetric unless otherwise stated, i.e., $B_{i} = -B_{m+1-i}$, $\forall i\in [m]$.   %The second step is to make sure that the expected value of the output $\mathbb{E}(\mathcal{M}(x)) = x$, hence achieving unbiasedness.

\paragraph{Selection distribution.} It determines the probability of selecting one bin on the left (or right) of the input $x$ in the first step of our mechanism. Assume $x \in [B_j, B_{j+1})$, then we will select the left index $l\in \{1,\cdots,j\}$ and the right index $r\in \{j+1,\cdots,m\}$. Let $L_j$ and $R_j$ be the random variables associated with the left index $l$ and right index $r$, respectively, when input $x \in [B_j, B_{j+1})$. Since the probability mass functions (PMF) of both $L_j$ and $R_j$ depend on the value of $j$, we use the following two functions $q_j,q_{m-j}$ to denote their PMF:
\begin{align*}
  &\Pr\{L_j=i\}:=q_j(i),&i\in \{1,\cdots,j\}\\
  &\Pr\{R_j = i\}:=q_{m-j}(m+1 - i),&i \in \{j+1,\ldots,m\}
\end{align*}
Note that $q_1(1) = 1$. See Figure \ref{fig:dis} for the illustration.

%, the number of bins on the left side of $x$. If there are $j$ bins on the left, then pmf of $L$ is denoted by $q_j(i)$. That is, $p\{L_j=i\} = q_j(i), i\in \{1,\ldots,j\}$. Note that $q_1(1) = 1$. Similarly, we use function $q_{m-j}(.)$ for pmf $R$. If there are $m-j$ bins on the ride side of $x$, then $p\{R_j = i\} = q_{m-j}(m+1 - i), i \in \{j+1,\ldots,m\}$ (see Figure \ref{fig:dis} for illustration of pmf of $L$ and $R$). 

Since both $q_j(\cdot), j\in \{1,2,\cdots,m\}$ and $\{B_1,\cdots, B_m\}$ are the parameters of mechanism $\mathcal{M}$, we need to design them carefully to minimize the absolute error while satisfying DP constraint. We introduce details of finding these parameters in Section~\ref{subsec:gen}. Given  $q_j(\cdot)$ and $\{B_1,\cdots, B_m\}$, Algorithm~\ref{alg:gen} summarizes our mechanism $\mathcal{M}$.


%The distribution of selecting one bin from $n$ optional bins is denoted as $q_n$. The probability of selecting the $i$-th bin among $n$ bins is denoted as $q_n(i)$ ($q_1(1)=1$). Here we use larger index $i$ to denote bins that are closer to $x$. Denote the event that the bin $B_{i}$ on the left is selected as $\mathcal{L}(i)$, and the bin $B_{i}$ on the right is selected as $\mathcal{R}(i)$. Hence $p(\mathcal{L}(i))=q_{j}(i), (i \in \{1, \dots, j\})$, and $p(\mathcal{R}(k))=q_{m-j}(m-k+1), (k \in \{j+1, \dots, m\})$. 


\begin{algorithm}[tb]
%{\small
\caption{Proposed quantization mechanism $\mathcal{M}$}
   \label{alg:gen}
\begin{algorithmic}[1]
   \STATE {\bfseries Input:} bin values ${B_1, \cdots, B_m}$, input $x \in [B_j, B_{j+1})$, PMF $q_j,q_{m-j}$ of $L_j$ and $R_j$. 
   \STATE $l \gets i$ w.p. $q_j(i)$, $i \in \{1,\cdots,j\}$.
   \STATE $r \gets i$ w.p. $q_{m-j}(m+1-i)$, $i \in \{j+1,\cdots,m\}$.
   \STATE $\mathcal{M}(x) \gets B_l$ w.p. $\frac{B_r-x}{B_r-B_l}$, and $B_r$ w.p. $\frac{x-B_l}{B_r-B_l}$.
   \STATE {\bfseries Output} $\mathcal{M}(x)$
\end{algorithmic}%}
\end{algorithm}


%One possible way of inducing high utility, i.e., ensure a small distance between the output and the input, is assigning a higher selection probability to bins that are closer to the input, i.e., $q_n(i) < q_n(i+1)$. One example is illustrated in Fig.~\ref{fig:dis}. Selection distribution have to be designed in advance and is consistent for all possible input $x$. Assume that $-c \leq B_{i} < B_k \leq c$, then we have to decide at least $k-i+2$ selection distributions, from $q_{i-1}$ to $q_k$.   

\begin{figure}[ht]
%\vskip 0.2in
\begin{center}
\centerline{\includegraphics[width=0.8\columnwidth]{uai2024/fig/fig_qij.pdf}}
\caption{An example of selection distribution}
\label{fig:dis}
\end{center}
\vskip -0.4in
\end{figure}

% Before discussing our algorithm for finding the optimal mechanism $\mathcal{M}$, we would like to present two special cases of our proposed mechanism.%  we want to discuss that our proposed mechanism $\mathcal{M}$ can be reduced to 

\subsection{Special cases}\label{subsec:special}
Section~\ref{subsec:mechanism} presents a general framework for quantization. Indeed, some existing mechanisms proposed in prior works can be regarded as a special case of ours, as detailed below.

\textbf{Randomized Quantization Mechanism (\textsf{RQM})  .}~~%\label{subsec:rqm}
It is proposed by \citet{rqm} and is a special case of ours. Specifically, $\forall x \in [-c, c]$, \textsf{RQM} randomly outputs one bin from $\{B_1,\cdots,B_m\}$ with $$B_i = -\Delta-c + (i-1)\frac{2c+2\Delta}{m-1}, ~ i\in [m], $$ That is, interval $[-c-\Delta, c+\Delta]$ is divided uniformly into $m$ bins. This differs from ours where we enable non-uniformly distributed bins and the bin values are parameters to be optimized (see details in Section~\ref{subsec:gen}). 

To quantize $x$, \textsf{RQM} first selects a subset of bins: $B_1$ and $B_m$ are selected with probability $1$, while among the rest $m-2$ bins $\{B_2,\cdots, B_{m-1}\}$, each of them is selected independently with probability $q<1$. Given the selected bins, the one closest to $x$ on the left (resp. right) side is denoted as $B_l$ (resp. $B_r$). Finally, \textsf{RQM} selects either $B_l$ or $B_r$ as the output randomly based on Eq.~\eqref{equ:dither}. It turns out that \textsf{RQM} is a special case of our mechanism where selection distribution follows a \textit{Geometric distribution} with parameter $q$, i.e.,  
%each bin will first be selected with probability $q$ (except for $B_1$ and $B_m$, which will always be selected). Among all the selected bins, the bin closest to the input $x$ on the left side (denoted as $B_l$) and the closest bin on the right side (denoted as $B_r$) will be the potential outputs. The probability of $B_r$ and $B_l$ being selected as the output is given by Equation~\ref{equ:dither}. RQM is a special case of our mechanism in which selection distribution follows a \textbf{Geometric distribution} with parameter $q$, 
\begin{equation*}
    q_j(i) = 
\begin{cases}
    (1-q)^{j-1},& \text{if $i = 1$} \\
    q{(1-q)}^{j-i}, & \text{if $1 < i \leq j$}.
\end{cases}
\end{equation*}

%\iffalse

%\begin{equation}\label{equ:rqm}
%            q_j(i) = \left\{                    q{(1-q)}^{j-i},~ i\in [j]                    \right\end{equation}\fi

%Note that another distinction between our mechanism and RQM is that RQM assumes all bins are uniformly distributed between $[-c-\Delta, c+\Delta]$, while our mechanism can also adapt non-uniformly distributed bins.

\paragraph{Exponential Randomized Mechanism (\textsf{ERM}).}%\label{subsec:erm}
Inspired by the classic \emph{Exponential Mechanism} (Definition~\ref{def:em}), we can propose \textsf{ERM} which outperforms \textsf{RQM} (see the comparison in Section~\ref{sec:res}) but can still be regarded as a special case of our proposed mechanism. Under \textsf{ERM}, bins are symmetric and satisfy $B_i = -B_{m+1-i}, \forall i \in [m]$.  \textsf{ERM} uses a distribution similar to the exponential mechanism for the selection distribution. Specifically, for input $x\in [B_j,B_{j+1})$, PMF $\Pr\{L_j  = i\} = q_j(i)$ in \textsf{ERM} depends on the distance between bin $B_i$ and $B_j$ and %($B_j$ is the closest bin to $x$ on the left side of $x$). Similarly, under ERM, 
$\Pr\{R_j  = i\} = q_{m-j}(m+1-i)$ depends on the distance between bin $B_i$ and $B_{j+1}$. In other words, \textsf{ERM} uses the following selection distribution: %among the closest bin to $x$ is the absolute distance from the object to the bin with the highest index (hence closest to the input), and the privacy loss is denoted as $\gamma$. The selection distribution is given as follows:
\begin{equation}\label{equ:exp}
    q_j(i) = 
        \frac{\exp\left\{\frac{\gamma(B_i-B_j)}{2(B_j-B_1)}\right\}}{\sum_{k=1}^{j}\exp\left\{\frac{\gamma(B_k-B_j)}{2(B_j-B_1)}\right\}},
\end{equation}
where $\gamma$ is a hyperparameter impacting both the privacy and accuracy of $\mathcal{M}$. After obtaining the realizations of $L_j$ and $R_j$, \textsf{ERM} uses Eq.~\eqref{equ:dither}to determine the final output. 

Next, we provide privacy and accuracy analysis for \textsf{ERM}. Theorem~\ref{thm:erm_privacy} below provides an upper bound for privacy loss. %The proof can be found in the Appendix.

\begin{theorem}[Privacy loss of \textsf{ERM}]\label{thm:erm_privacy}
Assume the interval $[-c-\Delta, c+\Delta]$ is divided uniformly into $m$ bins, i.e., $$B_i = -\Delta-c + (i-1)\frac{2c+2\Delta}{m-1}, ~ i\in [m].$$ %Let $c$, $\Delta$, $m$, $\gamma$ be the parameters of ERM.
Then \textsf{ERM} satisfies DP with privacy loss %privacy loss $\epsilon$ is upper bounded:
\begin{equation}\label{eq:privacy_erm}
 \epsilon < \gamma + \log \frac{2m(c+\Delta)}{c}.   
\end{equation}
\end{theorem}

\iffalse

\begin{theorem}\label{theo:1}
Assume that interval $[-c-\Delta, c+\Delta]$ is divided uniformly into $m$ bins (i.e., $B_i = -\Delta-c + (i-1)\frac{2c+2\Delta}{m} ~ i\in [m]$). Given the parameters of \textsf{ERM}: $c$, $\Delta$, $m$, $\gamma$, privacy loss $\epsilon$ is upper bounded by:
    $\epsilon < \log ( \frac{m+1}{\Delta} ) + \frac{3m^2(c+\Delta)^2\gamma}{\Delta c(m-1)^2}$.
\end{theorem}

\begin{proof}
The proof can be found in the Appendix.   
\end{proof}

\fi
%The upper bound indicates that when $\Delta$ approaches 0, the privacy loss will be infinite. Hence, it is required to have $\Delta >0$. Besides, 
The upper bound~\eqref{eq:privacy_erm} implies that the privacy loss is an increasing function in the number of bins $m$ and parameter $\gamma$. It is worth noting that according to \citep{rqm}, the privacy loss of \textsf{RQM} is bounded by $$\log\left(\frac{2(1-q)^2(c+\Delta)}{\Delta}\right) +m\log\frac{1}{1-q}.$$ This shows that the privacy loss under \textsf{RQM} also increases in $m$ at the rate of $\mathcal{O}(m)$. In contrast, our \textsf{ERM} has a better privacy loss that increases in $m$ at the rate of $\mathcal{O}(\log m)$.  


The next theorem provides an upper bound for the expected absolute error of \textsf{ERM}. %The proof can be found in the Appendix.
\begin{theorem}[Error of \textsf{ERM}]\label{thm:erm_error}
Under the same bins as Theorem~\ref{thm:erm_privacy}, %Let $B_i = -\Delta-c + (i-1)\frac{2c+2\Delta}{m-1}, ~ i\in [m]$. Given the parameters of the mechanism: $c$, $\Delta$, $m$, $\gamma$, 
the expected absolute error of \textsf{ERM} is bounded:
\begin{equation*}
 \mathbb{E}\left(|\mathcal{M}(x)-x|\right) \leq \frac{4}{\gamma}\log\left(m\right)\left(c+\Delta\right)+\frac{2c+2\Delta}{m-1}. 
\end{equation*}
\end{theorem}
The bound implies that when the extended range $\Delta$ increases or the privacy budget parameter $\gamma$ decreases (stricter privacy protection), the performance loss will also increase. 


%\begin{proof}
%The proof can be found in the Appendix.   
%\end{proof}

\section{Optimal Mechanism}\label{subsec:gen}

Section~\ref{subsec:mechanism}  introduced the general framework of our quantization mechanism. With different bin values $\{B_1,\cdots,B_m\}$ and selection distributions $q_j,q_{m-j}$, we will end up with different mechanisms and we discussed two special cases in Section~\ref{subsec:special}. In this section, we explore how to find the optimal mechanism by tuning these parameters. We call the quantization mechanism under the optimal parameter configuration ``{\textbf{OPT}imal randomized quantization \textbf{M}echanism (\textsf{OPTM})}." Before introducing \textsf{OPTM}, we first quantify privacy loss and mean absolute error of our mechanism under a given bin values $\{B_1,\cdots, B_m\}$ and selection distributions.

\subsection{Performance measure}\label{subsec:measure}

 Given bin values $\{B_1,\cdots,B_m\}$ and selection distributions $q_j,q_{m-j}$, we can find the \textit{output distribution} $\Pr\{\mathcal{M}(x) = i\}, i \in [m]$ for any input $x$. Let $p(x, i):= \Pr\{\mathcal{M}(x) = i\}$ be the probability that the output of the mechanism $\mathcal{M}$ for an input $x$ is $B_i$. Then, the probability that $\mathcal{M}$ outputs bin $B_l$ on the left of $x$ can be calculated by the law of total probability as follows,
\begin{equation*}\label{equ:p_left}
\resizebox{0.43\textwidth}{!}{$\displaystyle   p(x, l)=\Pr\{L_j=l\} \sum_{m \geq r\geq j+1}\left( \Pr\{R_j = r \} \frac{B_r-x}{B_r-B_l} \right).$}  
\end{equation*}
Similarly, for a bin $B_r$ on the right side of $x$, we have
\begin{equation*}\label{equ:p_right}
\resizebox{0.43\textwidth}{!}{$  \displaystyle  p(x, r)=\Pr\{R_j = r\} \sum_{1\leq l \leq j}\left( \Pr\{L_j = l\}\frac{x-B_l}{B_r-B_l} \right). $}
\end{equation*}
Hence, the output probability of each bin $B_i$ is given by:
{
\begin{align}\label{equ:output}
 \resizebox{0.43\textwidth}{!}{$  p(x,i) =\begin{cases}
    \displaystyle        q_j(i)  \sum_{r \in [j+1,m]} \left( q_{m-j}(m-r+1) \frac{B_r-x}{B_r-B_{i}}  \right), & \text{if $B_{i} \leq x$} \\
            \displaystyle q_{m-j}(m+1 - i )  \sum_{l \in [1, j]}\left( q_j(l) \frac{x-B_l}{B_{i}-B_l} \right), & \text{o.w.}
        \end{cases}$}
\end{align}}
\paragraph{Performance measure.} With the output distribution computed above, we can quantify the mean absolute error (MAE) of a mechanism $\mathcal{M}$ as follows,
\begin{equation}\label{eq:objective}\mathbb{E}\left(\lvert\mathcal{M}(X)-X\rvert\right) = \mathbb{E}_X\left(\sum_{i \in [m]} p(X,i) \lvert B_{i}-X\rvert\right).
\end{equation} 
To satisfy differential privacy, the output distribution with bounded privacy loss $\epsilon$ should satisfy: 
\begin{align}\label{equ:con}
    \frac{p(x,i)}{p(x^{\prime},i)} \leq e^{\epsilon}, \quad \forall x,x' \in [-c, c], i \in [m].
\end{align}
Our goal is to design parameters of $\mathcal{M}$, including $\Delta$, bin values $\{B_{1},\cdots,B_m\}$ and especially selection distributions $q_j, q_{m-j}$, such that MAE is minimized subject to bounded privacy loss $\epsilon$. Note that since $m$ determines the number of bits for quantizing $x$ (e.g., 2 bits equals $m=4$), we assume $m$ is pre-defined and is not a variable to be optimized.
%The parameters in mechanism $\mathcal{M}$ are $\Delta, B_{i} ~ (i\in [m]), q_j(i)~(i\leq j,~ j \in [m])$. Note that  $m$ determines the number of bits for quantizing $x$ (e.g., 2 bits is equivalent to $m=4$). As a result, we assume that $m$ is a pre-defined variable and is not part of our optimization problem. Privacy budget $\epsilon$  imposes the following constraint on our optimization problem for finding optimal mechanism $\mathcal{M}$,%Given the  The objective of optimizing the parameters of our mechanism, including selection distribution and output bins, is to minimize such expected errors while preserving the differential privacy constraints:
%\begin{align}\label{equ:con}
  %  \frac{p(x,i)}{p(x^{\prime},i)} \leq e^{\epsilon} \quad \forall x,x' \in [-c, c], i \in [m].
%\end{align}

%In Section \ref{subsec:gen}, we will formulate the optimization problem for finding optimal $\mathcal{M}$ subject to \eqref{equ:con}.
\subsection{\textsf{OPTM} as a linear program}

%So far we introduced our quantization mechanism and discussed two special cases of such a mechanism. As we mentioned before, our mechanism has several parameters/degrees of freedom which should be tuned.  In particular, 
The problem of finding the optimal parameters of $\mathcal{M}$ can be formulated as an optimization. Our goal is to simplify the optimization as a \textit{linear program} that can be efficiently solved using linear programming tools. Next, we first derive a linear upper bound of the objective function \eqref{eq:objective}. Then, we describe how to turn DP constraint~\eqref{equ:con} into linear constraints. Finally, we show how to reduce the complexity when the number of output bits is large with another set of constraints.

\paragraph{Linear upper bound for MAE.} Eq.~\eqref{eq:objective} shows that the mean absolute error is a non-linear function of $q_j(i)$. However, we can find a linear upper bound of it and use it as a proxy, as detailed below. %of $\mathbb{E}(|\mathcal{M}(x)-x|)$% that is linear, as shown below. %In particular, the upper bound can be obtained by the following theorem.
\begin{lemma}\label{lemma:x_error}
    For any input $x \in [B_{j}, B_{j+1})$, we have, %Then, for any given $x$, the following holds for Mean Absolute Error (MAE),
\begin{equation*} \mathbb{E}\left(|\mathcal{M}(x)-x|\right) \leq \frac{1}{2}\Big(\zeta_{m-j} + \left(B_{j+1}-B_{j}\right) + \zeta_{j}\Big), 
    \end{equation*}
    where $\zeta_n = \sum_{i \in [n]} q_n(i) (B_{n}-B_{i})$. 
\end{lemma}
If we know the distribution of $X$, we can use Lemma~\ref{lemma:x_error} to further find a linear upper bound of  $\mathbb{E}(|\mathcal{M}(X)-X|)$. An example for uniformly distributed $X$ is given in Theorem~\ref{thm:uniform}.

%Note that this is just an example; our method can be applied to any other distribution.  

\begin{theorem}\label{thm:uniform}
Suppose input $x\in[-c,c]$ follows uniform distribution, $\mathbb{E}(|\mathcal{M}(X)-X|)$ can be upper bounded by
\begin{equation}
\label{equ:opti}
\resizebox{0.43\textwidth}{!}{$\displaystyle
 \min_{q_j(i)} \sum_{s\leq n\leq  t+1} \big(\min(c, B_{n})-\max(-c, B_{n-1})\big) \big(\zeta_{n-1} + \zeta_{m-n+1}\big),$}  
\end{equation}
where $\zeta_n = \sum_{i \in [n]} q_n(i) (B_{n}-B_{i})$. $B_{s-1}\in [-c-\Delta,-c)$ and $B_{t+1}\in (c, c+\Delta]$ are two bins fall in extended range, $B_s<B_t$ are bins in $[-c,c]$ closest to $-c$ and $c$, respectively.
\end{theorem}

%The proof can be found in the Appendix.  

%\paragraph{\hl{Extension to non-uniform distribution.}} Theorem~\ref{thm:uniform} assumes that input $X$ follows uniform distribution. Our mechanism can also be generalized to 
For more general cases with partially known, non-uniformly, and even asymmetric distributed input $X$, our mechanism can still be adapted. %by leveraging non-uniformly distributed and asymmetric bins to capture the pattern of the distribution. 

Specifically, we first change the original definition of selection distribution in Section~\ref{subsec:mechanism} to the following:
\begin{align*}
  &\Pr\{L_j = i\}:=q_j^{(l)}(i),&i\in \{1,\cdots,j\}\\
  &\Pr\{R_j = i\}:=q_{m-j}^{(r)}(m+1-i),&i \in \{j+1,\ldots,m\}
\end{align*}
Both $q_j^{(l)}(\cdot), q_{m-j}^{(r)}(\cdot)$ for all possible $j \in [m]$ are parameters that need to be tuned. Then we can derive a linear upper bound of the mean absolute error (MAE) by extending Lemma~\ref{lemma:x_error} and Theorem~\ref{thm:uniform}. Theorem~\ref{thm:extend} below shows the result for non-uniformly distributed $X$ and asymmetric bins.

\begin{theorem}\label{thm:extend}
Suppose input $x\in[-c,c]$ follows any distribution, $\mathbb{E}(|\mathcal{M}(X)-X|)$ can be upper bounded by
\begin{equation}
\label{equ:opti}
\resizebox{0.43\textwidth}{!}{$\displaystyle
 \min_{q_j^{(l)}(i), q_{m-j}^{(r)}(i)} \sum_{i=s-1}^{t}  (\zeta_{m-i}^{(r)}+B_{i+1}-B_{i}+\zeta_{i}^{(l)}) \int_{\max(B_i, -c)}^{\min(B_{i+1}, c)} f_X(x) dx,$}  
\end{equation}
where $\zeta_{m-j}^{(r)} = \sum_{i \in \{j+1, \dots, m\}} q_{m-j}^{(r)}(m-i+1) (B_{i}-B_{j+1})$, $\zeta_j^{(l)} = \sum_{i \in [j]} q_j^{(l)}(i) (B_{j}-B_{i})$. $B_{s-1}\in [-c-\Delta,-c)$ and $B_{t+1}\in (c, c+\Delta]$ are two bins fall in extended range, $B_s<B_t$ are bins in $[-c,c]$ closest to $-c$ and $c$, respectively. $f_X(x)$ is the probability density function of $X$.
\end{theorem}
Note that the upper bound in Theorem~\ref{thm:extend} only depends on density $f_X(x)$ through the integral $\Pr(B_i \leq X < B_{i+1}) = \int_{B_i}^{B_{i+1}} f_X(x)dx$, which is easier to know (compared to density itself) and can be estimated from samples.

\paragraph{Linear differential privacy constraint.}  To satisfy $\epsilon$-DP, constraint~\eqref{equ:con} can be equivalently written as
\begin{equation}\label{equ:privacy_constraint}
\frac{\max_x p(x,i)}{\min_{x^{\prime}} p(x^{\prime},i)} \leq e^{\epsilon}, \quad \forall i     
\end{equation}
However, constraint~\eqref{equ:privacy_constraint} is non-linear and we need to convert it to a linear constraint. To this end, we will first show in Lemma~\ref{lemma:pr_set} that for each $i \in [m]$ and $x \in [-c, c]$, both $\max_x p(x,i)$ and $\min_x p(x,i)$ can be found in a finite set. Such property will then be leveraged to turn constraint \eqref{equ:privacy_constraint} into a set of linear constraints. % Next Theorem expresses constraint \eqref{equ:privacy_constraint} as a set of linear constraints.   %For each possible output $B_{i}$, we need to find an upper bound for the numerator and a lower bound for the denominator of \eqref{equ:privacy_constraint} for any input $x$. %Denote $I$ as the set of indices of all bins locating inside the input range $[-c, c]$. According to the output probability given by Equation~\ref{equ:output}, for $i \in [m]$, $p(x,i)$ is maximized when $x=B_{i}$ since the multiplier term is always less than or equal to 1, which induces $\maxp(x,i)=q_i(i)$. For $i \notin I$, $\maxp(x,i)=\max\{p(c,1), p(B_k,1) (k \in I)\}$, since $p(x,i)$ is monotonically decreasing within each interval divided by bins (e.g., $[-c, B_k)$, $[B_k, B_{k+1})$, or $[B_k, c)$). 
\begin{lemma}\label{lemma:pr_set}
     Assume that $\forall i,j \in [m], j \geq i$: $q_i(i) \geq q_j(i)$, then for all input $x \in [-c, c]$ and each $i \in [m]$: $$\textstyle \max_x p(x, i) \in \overline{\mathcal{S}}_i ~~~~\text{ and }~~~~\min_x p(x, i) \in \underline{\mathcal{S}}_i,$$ where both $\overline{\mathcal{S}}_i$ and $\underline{\mathcal{S}}_i$ are finite sets defined below.
\begin{align*}
\resizebox{0.42\textwidth}{!}{$
\overline{\mathcal{S}}_i= \begin{cases}
  \{q_i(i), q_{m+1-i}(m+1-i)\}, & \text{ if } B_{i} \in [-c,c] \\
 \{ p(-c, i)\} \cup \{p(B_k, i) | -c\leq B_k \leq c\},& \text{ if } B_i < -c.\\
 \{p(c, i)\} \cup \{p(B_k, i) | -c\leq B_k \leq c\},& \text{ if } B_i >c.
\end{cases}$}
\end{align*}
\begin{align*}
\resizebox{0.48\textwidth}{!}{$ \displaystyle
\underline{\mathcal{S}}_i= \begin{cases}
\displaystyle \left\{p(-c,i), p(c,i)\right\}\cup \Big\{ \lim_{x \to B_k} p(x,i) |-c\leq B_k \leq c\Big\}, & \text{ if } B_{i} \in [-c,c] \\
 \displaystyle \left\{p(c,i)\right\}\cup \Big\{ \lim_{x \to B_k} p(x,i) |-c\leq B_k \leq c\Big\},& \text{ if } B_i < -c.\\
\displaystyle \left\{p(-c,i)\right\} \cup \Big\{ \lim_{x \to B_k} p(x,i) |-c\leq B_k \leq c\Big\},& \text{ if } B_i >c.
\end{cases}$}
\end{align*}
where $\lim_{x \to B_k}p(x,i)$ above is calculated as follows
\begin{align*}
\resizebox{0.48\textwidth}{!}{$ \displaystyle
 \lim_{x \to B_k}p(x,i)= 
 \begin{cases}
\displaystyle q_{k-1}(i)  \sum_{r \in [k+1, m]}\bigg( q_{m-k+1}(m-r+1) \frac{B_r-B_k}{B_r-B_{i}} \bigg),& \text{ if }B_i < B_k.\\
\displaystyle q_{m-k}(m+1-i)  \sum_{l \in [1, k-1]}\bigg( q_k(l) \frac{B_k-B_l}{B_i-B_l} \bigg),& \text{ if }B_i > B_k.
 \end{cases} $}  
\end{align*}
%if $-c \leq B_{i} \leq c$, then for maximal probability, we have:{\small\begin{equation*}   \max_x p(x, i) = q_i(i).\end{equation*}}

%If $B_i < -c$, we have:
%{\small
%\begin{equation*}   \max_x p(x, i) \in \{ p(-c, i)\} \cup \{p(B_k, i) | -c\leq B_k \leq c\}.\end{equation*}}

%If $B_i > c$, we have:{\small\begin{equation*}  \max_x p(x, i) \in \{p(c, i)\} \cup \{p(B_k, i) | -c\leq B_k \leq c\}.\end{equation*}}
    
%For minimal probability, if $-c \leq B_{i} \leq c$, we have:
%{\small\begin{equation*}  \min_x p(x,i) \in \{ \lim_{x \to B_k} p(x,i) |-c\leq B_k \leq c\} \cup \{p(-c,i), p(c,i)\}.\end{equation*}}

%If $B_{i} < -c$, we have:{\small\begin{equation*}   \min_x p(x,i) \in \{ \lim_{x \to B_k} p(x,i) |-c\leq B_k \leq c\} \cup \{p(c,i)\}.\end{equation*}}

%If $B_{i} > c$, we have:{\small\begin{equation*} \min_x p(x,i) \in \{ \lim_{x \to B_k} p(x,i) |-c\leq B_k \leq c\} \cup \{p(-c,i)\}.\end{equation*}}

%$\lim_{x \to B_k}p(x,i)$ is calculated as follows. If $B_i < B_k$, then we have:

%{\small
%\begin{equation}
%\begin{aligned}
  %  &\lim_{x \to B_k}p(x,i) \\  
 %   &=  q_{k-1}(i)  {\sum}_{r \in [k+1, m]}\bigg( q_{m-k+1}(m-r+1) \frac{B_r-B_k}{B_r-B_{i}} \bigg) \label{equ:min_pr}.
%\end{aligned}
%\end{equation}}

%If $B_i > B_k$, we have:

%{\small
%\begin{equation}
%\begin{aligned}
 %   &\lim_{x \to B_k}p(x,i) \\
 %   & q_{m-k}(m+1-i)  {\sum}_{l \in [1, k-1]}\bigg( q_k(l) \frac{B_k-B_l}{B_i-B_l} \bigg).
%\end{aligned}
%\end{equation}}

\end{lemma}
Note that $p(x,i)$ is discontinuous and $\lim_{x \to B_k}p(x,i)$ may not equal to $p(B_k,i)$. Lemma~\ref{lemma:pr_set} shows that for each $i \in [m]$, there is only a finite number of possible values for both $\max_x p(x,i)$ and $\min_x p(x,i)$. Therefore, if  we can ensure $\overline{s} \leq e^\epsilon \cdot \underline{s}$ holds for any $\overline{s} \in\overline{\mathcal{S}}_i$ and $\underline{s} \in\underline{\mathcal{S}}_i$, then privacy constraint~\eqref{equ:privacy_constraint} is also guaranteed to hold. The monotonicity of the output probability between each pair of bins ensures that we can find a finite set of maximal and minimal probabilities. Example 1 uses specific output distributions of $\textsf{ERM}$ to illustrate this.
%We provide an example below.
\begin{example}
    Figure~\ref{fig:erm_dist} shows two probabilities $p(x,3)$ and $p(x,6)$ of $\textsf{ERM}$ when $m=8$. Note that $\lim_{x \to B_3^{+}} p(x,3)=p(B_3,3)=q_3(3)$ and $\lim_{x \to B_3^{-}} p(x,3)=q_6(6)$. We have $\max_x p(x,i)\in \{q_3(3), q_6(6)\}$. When $x$ increases from $B_3$ to $B_4$, or decreases from $B_3$ to $-c$, $p(x,3)$ decreases. When $x$ increases from $B_4$ to $B_5$, $B_5$ to $B_6$, $B_6$ to $c$, $p(x,3)$ also decreases. Hence, we have $\min_x p(x,3) \in \{p(-c,3)$, $\lim_{x \to B_4} p(x,3)$, $\lim_{x \to B_5} p(x,3), \lim_{x \to B_6} p(x,3), p(c,3)\}$. The curve of $p(x,3)$ is symmetric to $p(x,6)$ around 0. In Theorem~\ref{thm:constraint1}, we will use this property to get compact privacy constraints.
    \begin{figure}[h]
    \centering
     % \vspace{-0.3cm}
      \includegraphics[width=0.85\linewidth]{uai2024/fig/erm_distribution.pdf}
   %    \vspace{-0.cm}
    \caption{An example of output distribution}
    \label{fig:erm_dist}
  %  \vspace{-0.6cm}
\end{figure}
\end{example}


%indicates that for each index $i \in [m]$, we can know that $\max_x p(x,i)$ and $\min_x p(x,i)$ must exist in a finite set, namely the set of possible maximal probabilities and possible minimal probabilities. For each $i$ and each element within these two sets, i.e., each possible maximal probability $\mathrm{MaxPr}$ and each possible minimal probability $\mathrm{MinPr}$, if constraint $\mathrm{MaxPr} \leq e^\epsilon \mathrm{MinPr}$ satisfies, then the privacy constraint \eqref{equ:privacy_constraint} also satisfies. 

However, $\lim_{x \to B_k}p(x,i)$ and $p(x,i)$ are quadratic functions in $q_j(i)$ (see Section~\ref{subsec:measure}). We still need to convert them into linear forms. To this end, we further assume that each probability $q_j(i)$ has a non-zero lower and upper bound, i.e., $$o_j(i) \leq q_j(i) \leq u_j(i), ~~i,j \in [m], i \leq j,$$ 
where $o_j(i)$ and $u_j(i)$ are hyperparameters that can be found by a grid search. Then, we can replace $q_j(i)$ with $o_j(i)$ or $u_j(i)$ to get a lower bound  $w(x,i)$ or upper bound $z(x,i)$ of $p(x,i)$. We illustrate this using an example.
\begin{example}\label{example}
Consider the following constraint 
\begin{equation}\label{equ:example_constraint}
    p(B_k,i) \leq e^\epsilon \cdot \lim_{x \to B_k}p(x,i).
\end{equation}
where $B_i < B_k$, $p(B_k,i)\in \overline{\mathcal{S}}_i$ and $\lim_{x \to B_k}p(x,i)\in \underline{\mathcal{S}}_i$ by Lemma~\ref{lemma:pr_set}. To make constraint~\eqref{equ:example_constraint} linear, we can replace $p(B_k,i)$ with upper bound $z(B_k,i)$ and replace $\lim_{x \to B_k}p(x,i)$ with a lower bound $w(B_k,i)$. Specifically, $\forall x \in [B_j, B_{j+1})$,  
\begin{equation}\label{equ:def_z}
\resizebox{0.4\textwidth}{!}{$
\displaystyle
    z(x,i) = u_j(i)  \sum_{r \in [j+1,m]} \left( q_{m-j}(m-r+1) \frac{B_r-x}{B_r-B_i}  \right) $}
\end{equation}
\begin{equation}\label{equ:def_w}
 \resizebox{0.48\textwidth}{!}{$
\displaystyle   w(x,i) = 
\begin{cases}
  \displaystyle  o_{j}(i)  \sum_{r \in [j+1, m]}\bigg( q_{m-j}(m-r+1) \frac{B_r-x}{B_r-B_i} \bigg),& \text{ if }x = -c \text{ or } c.\\
 \displaystyle   o_{j-1}(i)  \sum_{r \in [j+1, m]}\bigg( q_{m-j+1}(m-r+1) \frac{B_r-B_j}{B_r-B_{i}} \bigg), & \text{ o.w. }
\end{cases}
$}
\end{equation}

Instead of using constraint~\eqref{equ:example_constraint}, we use a stricter version
\begin{equation*}
z(B_k,i) \leq e^\epsilon \cdot w(B_k,i).
\end{equation*}
\end{example}
Similar to Example~\ref{example}, for any $\overline{s} \in\overline{\mathcal{S}}_i$  and $\underline{s} \in\underline{\mathcal{S}}_i$, we can turn non-linear constraint $\overline{s} \leq e^\epsilon \cdot \underline{s}$ into a stricter version that is linear. Besides, since the bins and the output distribution are symmetric around 0, we only need to find $\min_x p(x,i)$ from $x \in [B_i, c]$ instead of $[-c,c]$. This results in the constraints detailed in Theorem~\ref{thm:constraint1}. %Since output distribution is symmetric around 0 (by Section~\ref{subsec:measure}), we can Theorem~\ref{theo:4} 

%For example, one privacy constraint derived from Lemma~\ref{lemma:1} is as follows:



%Replace $p(B_k,i)$ with upper bound $z(B_k,i)$, where $z(x,i) (x \in [B_j, B_{j+1}])$ is calculated by:



%Replace $\lim_{x \to B_k}p(x,i)$ with lower bound $w(B_k,i)$, where $w(x,i) (x \in [B_j, B_{j+1}])$ is calculated by:



%Now we can have the following linear constraint: 


%which implies \eqref{equ:example_constraint}. Hence we can use $w(x,i)$ and $z(x,i)$ to replace $p(x,i)$ to get linear privacy constraints. Besides, since the bins and the output distribution is symmetric around 0, we only need to consider the probability that $x$ is mapped to a bin on its left. Therefore we have the following constraints.
\begin{theorem}\label{thm:constraint1}
If the bins are symmetric, i.e., $B_{i} = -B_{m+1-i}$, then privacy constraint \eqref{equ:privacy_constraint} can be satisfied if the following $\mathcal{O}(m^3)$ linear constraints are satisfied.
\begin{itemize}[leftmargin=*]
{\small
    \item $\forall i,k \in [m], -c \leq B_{i} < B_k\leq c:$
    \begin{align*}
    & q_i(i) \leq e^\epsilon \cdot w(B_k,i); & & q_{m+1-j}(m+1-j) \leq e^\epsilon \cdot w(B_k,i); \\
    &  q_i(i) \leq e^\epsilon \cdot w(c,i); & & q_{m+1-j}(m+1-j) \leq e^\epsilon \cdot w(c,i);
    \end{align*}
    \item 
$\forall i,k \in [m], B_{i} < -c < B_k:$
\begin{align*}
    z(B_k,i) \leq e^\epsilon \cdot w(B_k,i); & &z(B_k,i) \leq e^\epsilon \cdot w(c,i);\\
    z(-c,i) \leq e^\epsilon \cdot w(B_k,i); &&z(-c,i) \leq e^\epsilon \cdot w(c,i);
\end{align*}
    \item $\forall i,j \in [m], i \leq j, B_{j} \leq c, B_{j+1} > -c:$
    \begin{align*}
     o_{j}(i) \leq q_{j}(i) \leq u_j(i); \qquad 
     q_i(i) \geq q_j(i)
    \end{align*}
    }
\end{itemize}


%$\forall i,k \in [m], -c \leq B_{i} \leq c,  k > i:$
%\begin{itemize}
 %   \item $q_i(i) \leq e^\epsilon \cdot w(B_k,i)$
%    \item $q_i(i) \leq e^\epsilon \cdot w(c,i)$
%\end{itemize}

%$\forall i,k \in [m], B_{i} \leq -c,  k > -c:$

%\begin{itemize}
%    \item $z(B_k,i) \leq e^\epsilon \cdot w(B_k,i)$
 %   \item $z(B_k,i) \leq e^\epsilon \cdot w(c,i)$
 %   \item $z(-c,i) \leq e^\epsilon \cdot w(B_k,i)$
 %   \item $z(-c,i) \leq e^\epsilon \cdot w(c,i)$
%\end{itemize}

%$\forall i,j \in [m], i \leq j, B_{j} \leq c, B_{j+1} > -c:$
%\begin{itemize}
%    \item $o_{j}(i) \leq q_{j}(i) \leq u_j(i)$
%\end{itemize}

where $w(\cdot,\cdot)$ and $z(\cdot,\cdot)$ are specified in \eqref{equ:def_w} and \eqref{equ:def_z}. 
% Moreover, there exists o_j(i), j\in [m] i\leq j such that constrat 12 also implies the above constraint.
\end{theorem}

\paragraph{Complete optimization.} Combining the above results, we can formulate a linear program for the optimal mechanism. Specifically, we minimize the upper bound of MAE in Theorem \ref{thm:uniform} subject to 1) a set of linear constraints in Theorem \ref{thm:constraint1}, and 2) constraint for distribution $q_j$, i.e.,
%\begin{align*}
    $0 \leq q_j(i) \leq 1,
    \sum_{i=1}^j q_j(i)=1, \forall i, j$.
    
%\end{align*}
The complete procedure for finding the optimal mechanism is shown in Algorithm~\ref{alg:optm}. This optimization can be solved by a linear programming tool denoted by $\mathrm{LinProg}()$, which takes the bin values $\{B_{i}\}_{i\in[m]}$, privacy parameter $ \epsilon$, lower and upper bounds $o_{j}(i), u_{j}(i)$, $i,j\in [m], i\leq j$ as inputs and returns the optimal selection distribution ${q_j(i)}$.

Here we regard $o_{j}(i), u_{j}(i)$ as hyperparameters and use grid search to find the optimal ones. Although bin values $\{B_i\}$ are treated as inputs in Algorithm~\ref{alg:optm}, we can find the optimal bins $\{B_i\}$ using techniques such as grid search to further minimize MAE under a fixed privacy parameter $\epsilon$.  

%We first fix the bins $B_{i} (i \in [m])$ and privacy budget $\epsilon$ as the inputs, iterate over calculate the corresponding objective function and constraints, use linear programming solver $\mathrm{LinProg}()$ to solve the optimization problem, and return the value of the objective function $obj$ and the optimal selection probabilities $\{q_j(i)\}$. Hence, we can use grid search over $o_j(i)$ and $u_j(i)$ to find the optimal probabilities $\{q_j(i)\}$ with the minimal objective value $obj$. We can also use gird search to find the optimal bins with the minimal Mean Absolute Error in a similar way. 



\paragraph{Reduce complexity.}
As the number of bins $m$ increases, both the number of $q_j(i)$ and the choice of lower and upper bounds $o_j(i), u_j(i)$ increase. Since the optimal $o_j(i), u_j(i)$ are found via grid search, running Algorithm~\ref{alg:optm} can be computationally expensive when $m$ is large. Nonetheless, we can formulate the original privacy constraint \eqref{equ:privacy_constraint} as another set of linear constraints, which are also stricter but significantly reduce the number of $o_j(i)$ and $u_j(i)$ required to conduct the grid search compared to constraints in Theorem~\ref{thm:constraint1}. 

\begin{algorithm}
% {\small   
\caption{\textsf{OPTM}: find optimal selection distribution}
    \label{alg:optm}
    \begin{algorithmic}[1]
        \STATE {\bfseries Input:} bin values $\{B_1,\cdots,B_m\}$, privacy parameter $ \epsilon$
        \STATE $min\_value = \infty$;
        \STATE $P = \emptyset$;
        \FOR{all possible $o_j(i), u_j(i)$ pairs in grid search}
        \STATE $obj, \{q_j(i)\} \gets \mathrm{LinProg}\left(\{B_{i}\}_{i\in[m]}, \epsilon, o_j(i), u_j(i)\right)$;
        
        \IF{$obj \leq min\_value$}
        \STATE $min\_value \gets obj$;
        \STATE $P \gets \{q_j(i)\}$;
        \ENDIF
        \ENDFOR
        \STATE {\bfseries Return:} selection probabilities $P$
    \end{algorithmic}%}
\end{algorithm}

%We can use extra assumptions to reduce the number of $o_j(i)$ and $u_j(i)$ that are required to conduct grid search upon during the optimization process, as stated in Theorem~\ref{theo:5}. 
\begin{theorem}\label{thm:constraint2}
If the bins are symmetric, i.e., $B_{i} = -B_{m+1-i}$, then the privacy constraint \eqref{equ:privacy_constraint} can be satisfied if the following linear constraints are satisfied.
\begin{itemize}[leftmargin=*]
{\small 
    \item $\forall i,j \in [m], i \leq j:$
        \begin{align*}
            q_j(i) \geq q_{j+1}(i); && q_j(i) \leq q_{j}(i+1);
        \end{align*}
    \item $\forall i\in [m], -c \leq B(i) \leq c:$
        \begin{align*}
            q_i(i) \geq q_{i+1}(i+1);
        \end{align*}
    \item Let $B_s<B_t$ be bins in $[-c,c]$ closest to $-c$ and $c$, respectively:
  \begin{align*}
           & z(-c,s-1) \leq e^{\epsilon} \cdot w(B_t,1); & & q_s(s) \leq e^{\epsilon} \cdot w(B_t,1);  \\
           & z(-c,s-1) \leq e^{\epsilon} \cdot w(c,1); & & q_s(s) \leq e^{\epsilon} \cdot w(c,1).
        \end{align*}
    \item $\forall r, k \in [m], s \leq k \leq t, r > k+1$:
        \begin{align*}
            \frac{q_{m-k+1}(m-r+1)}{B_{r}-B_{k+1}}& \geq &\frac{q_{m-k}(m-r+1)}{B_{r}-B_{k}}; \\
            \frac{q_{m-k}(m-r+1)}{B_{r}-B_{k+1}}& \geq &\frac{q_{m-k-1}(m-r+1)}{B_{r}-B_{k}};
        \end{align*}
        }
\end{itemize}


\end{theorem}
The constraints in Theorem~\ref{thm:constraint2} induces that
\begin{align*}
& \max_{x,i} p(x,i) \in \left\{p(-c,s-1), q_s(s)\right\}; \\
& \min_{x,i} p(x,i) \in \left\{\lim_{x \to B_t} p(x,1), p(c,1)\right\},
\end{align*}
where $s$ and $t$ are as defined in Theorem~\ref{thm:constraint2}. Under this set of linear constraints, the upper bound $u_j(i)$ only appears when calculating $z(-c,s-1)$, and the lower bound $o_j(i)$ is only used for computing $w(B_t,1)$ and $w(c,1)$. Thus, we can conduct a grid search over 3 variables, regardless of the number of output bits. 

%In the next part, we will give a concrete example about what are the optimization problem look like when $m=4$, and how to reduce the complexity of the grid search, which can remain to be $\mathcal{O}(1)$ regardless of the number of $m$.
