
\section{Introduction}
\label{sec:intro}

Differential privacy (DP) has become a de facto standard for preserving individual data privacy in data analysis, ranging from simple tasks such as data collection and statistical analysis to complex machine learning tasks \citep{wang2020differentially,jayaraman2019evaluating,zhang2018improving,zhang2018recycled,zhang2019recycled,zhang2022differentially,liu2021robust,khalili2021designing,khalili2021improving,hopkins2022efficient}. It centers around the idea that the output of a certain mechanism or computational procedure should be statistically similar given singular changes to the input, thereby preventing meaningful inference from observing the output. Although many DP mechanisms such as Gaussian mechanism \citep{gaussian}, Laplace mechanism \citep{laplace}, etc., have been proposed to date to preserve individual privacy for different computational tasks, they are mostly designed for continuous outputs over the real numbers and are unsuitable for scenarios where discrete outputs are necessary.  

Indeed, keeping outputs discrete is desirable and even necessary for many applications. For example, representing real numbers on a finite computer requires data discretization, but naively using finite-precision rounding may compromise privacy~\citep{least_bit}. Real-valued outputs can induce high communication overheads, and compressing the continuous inputs to discrete and bounded outputs may be necessary for settings with bandwidth bottlenecks, e.g., federated learning \citep{fedpaq,jin2024performative}. Moreover, continuous outputs are incompatible with cryptographic primitives such as secure aggregation~\citep{secagg}. It is thus essential to develop DP mechanisms that generate discrete outputs while preserving privacy.  

To tackle the challenges mentioned above, many discrete DP mechanisms have been proposed, e.g.,~\citep{discrete-gaussian, skellam, dis_gau_fed}. However, the outputs generated by these mechanisms may be \textit{biased} under truncation. Because in many applications such as machine learning, survey data collection, etc., it is often crucial to maintain the \textit{unbiasedness} of private outputs, these approaches may not be desirable. For instance, when differentially private gradients are used to update machine learning models, keeping them unbiased helps the model gets updated towards the optimal solution and converges faster~\citep{optimization}. 


%When applying DP mechanisms to machine learning algorithms (e.g., perturbing gradients during the model training process), the \textit{unbiasedness} of the output is important to \xr{ensure a tight convergence guarantee for algorithms like stochastic gradient descent~\citep{optimization}.} 

To the best of our knowledge, only a few works proposed mechanisms that can generate discrete unbiased outputs under DP. This includes 1) \textit{Minimum Variance Mechanism} (\textsf{MVU})~\citep{mvu}, which samples outputs from discrete alphabets and achieves the optimal utility by optimizing both the sampling probabilities and output alphabets. However, as the size of output alphabet increases, solving this optimization problem can be particularly challenging and the unbiasedness constraint must be relaxed; 2) \textit{Randomized Quantization Mechanism} (\textsf{RQM})~\citep{rqm} which randomly maps inputs to closest pair of sampled bins. However, \textsf{RQM} assumes uniformly distributed bins and has only three hyperparameters that can be tuned, hence has smaller search space for hyperparameters to achieve good privacy-accuracy trade-off compared with \textsf{MVU}; 3) \textit{Poisson Binomial Mechanism} (\textsf{PBM})~\citep{pbm} which generates unbiased estimators by mapping inputs to a discrete distribution with bounded support. However, \textsf{PBM} has inferior flexibility and utility-privacy trade-off than \textsf{RQM} because it has fewer hyperparameters; 4) other DP mechanisms such as \textit{Distributed Discrete Gaussian Mechanism}~\citep{dis_gau_fed} and \textit{Skellam Mechanism}~\citep{skellam} are unbiased on the unbounded support. However, they have to be truncated when combined with secure aggregation protocols, which will produce biased outputs.

%when the size of the output alphabet increases, the optimization problem will be too hard to solve such that the unbiasedness constraint have to be relaxed. Randomized Quantization Mechanism (RQM)~\cite{rqm} randomly maps inputs to closest pair of bins which are subsampled from uniformly distributed bins. However, RQM has few hyperparameters to tune in order to achieve better utility-privacy trade-off. 

This paper proposes a novel randomized quantization mechanism with discrete, unbiased outputs under DP guarantee. Importantly, our mechanism ensures unbiasedness regardless of the number of output bits; it is a general framework and the existing mechanism \textsf{RQM} can indeed be considered as a special case of ours. Specifically, given a set of quantization bins $B_1 <B_2< \cdots< B_m$, discrete DP mechanism maps the continuous input $x$ to one of these bins. Our mechanism first samples two bins from the left and the right side of the input based on a pre-defined \textit{selection distribution}, and then outputs one of the bins with unbiased expectation. For an example where $m=4$ and $x \in [B_2, B_3)$. Our mechanism first randomly selects one bin on the left of $x$ (e.g., $B_1$) and another bin on the right (e.g., $B_3$) according to a selection distribution, then randomly outputs either $B_1$ or $B_3$ while preserving unbiasedness.
The key is to carefully design selection distributions that maximize the accuracy of quantized outputs subject to DP constraint. Although this problem can be easily formulated as a non-linear constraint optimization, we propose a method that turns such non-linear optimization into a linear program that can be solved efficiently using linear programming tools. Experiments on both synthetic and real data validate the effectiveness of the proposed method. %  show that our mechanism can achieve better accuracy-privacy trade-off than baselines.  

Our contribution can be summarized as follows:
\begin{enumerate}[leftmargin=*]
    \item We propose a family of differentially private quantization mechanisms that generate discrete and unbiased outputs.% for any communication budgets.
    \item We theoretically quantify the privacy and accuracy of the exponential randomized mechanism (\textsf{ERM}), a special case of our proposed mechanism where selection distribution is based on DP exponential mechanism. 
    \item We design a linear program to find the optimal selection distribution of our mechanism, resulting in the optimal randomized quantization mechanism
(\textsf{OPTM}), which attains a better accuracy-privacy trade-off.
    \item We conduct experiments on various tasks to show our mechanisms, including both \textsf{ERM} and \textsf{OPTM}, attain superior performance than baselines.
\end{enumerate}
