
\section{Discussion}
Our proposed mechanisms can be generalized to broader settings, including dynamic, high-dimensional, and biased quantization. We discuss these extensions below. 

\paragraph{Extension to high-dimensional quantization.}
Besides entry-wise discretization, our method can also be extended to higher-dimensional quantization with a similar method as in~\citep{mvu}. Specifically, for any $d$-dimensional input vector $\textbf{x}=(\textbf{x}_1, \cdots, \textbf{x}_d)$ with $L_2$ norm bounded by diameter $B$, we map the input vector $\textbf{x}$ to $\mathcal{M}_d(\textbf{x})=(\mathcal{M}^{\prime}(\textbf{x}_1), \cdots, \mathcal{M}^{\prime}(\textbf{x}_d))$. Here, the mechanism $\mathcal{M}^{\prime}$ quantize the scalar in each coordinate and needs to satisfy $\epsilon$-metric DP, a variant of $\epsilon$-DP that requires the following holds for any two inputs $x, x^{\prime}$ and any set of possible outputs $S \subseteq$ Range($\mathcal{M}$): $$\Pr(\mathcal{M}^{\prime}(x) \in S) \leq e^{\epsilon d(x, x^{\prime})} \Pr(\mathcal{M}^{\prime}(x^{\prime}) \in S),$$ where $d(x, x^{\prime})=|x - x^{\prime}|^2$. Since Lemma 6 in~\cite{mvu} has shown that the mechanism $\mathcal{M}_d$ generated by $\epsilon$-metric DP $\mathcal{M}^{\prime}$ is $\epsilon B^2$-DP and unbiased, we can directly use our method to find the optimal parameters of $\mathcal{M}^{\prime}$ (under new privacy constraints specified by $\epsilon$-metric DP).

\paragraph{Extension to biased quantization.}
Our unbiased mechanism can be extended to biased quantization, finding a new tradeoff between bias, deviation, and privacy. Instead of randomly outputting either $B_l$ or $B_r$ and enforcing unbiasedness according to Eq.~\eqref{equ:dither} as defined in Section~\ref{subsec:mechanism}, we can use the exponential mechanism to output either $B_l$ or $B_r$, with score function being the negative distance between the input and output bins. This mechanism induces biased output but reduces privacy loss. 

\paragraph{Extension to dynamic settings.} 
Our method can also be extended to dynamic quantization, where different inputs require quantization mechanisms with different hyperparameters. One potential solution is to integrate the existing dynamic quantization strategies, such as the optimal quantization bit-width~\citep{dyn_quan_bit}, the clipping range of activation values~\citep{dyn_quan_act} in a quantized neural network; both methods find hyperparameters (e.g., number of bins, clipping range) during runtime. After these hyperparameters are decided and samples of inputs are collected, we can directly use our algorithm to find the optimal quantization mechanism. 

\section{Experiments}\label{sec:res}


Next, we validate two proposed mechanisms: 1) optimal randomized quantization mechanism (\textsf{OPTM}) proposed in Section~\ref{subsec:gen}; 2) exponential randomized
mechanism (\textsf{ERM}), a special case of \textsf{OPTM} proposed in Section~\ref{subsec:special}. We use grid search to find bin values with the best performance.


We conduct three sets of experiments: (i) scalar input quantization; (ii) vector input quantization; and (iii) quantization in stochastic gradient descent (SGD). For each experiment, we compare our mechanisms  with two baselines: 
\begin{itemize}[leftmargin=*]
    \item Randomized quantization mechanism (\textsf{RQM}) \citep{rqm}:  a special case of \textsf{OPTM} with uniformly-distributed bins as discussed in Section~\ref{subsec:special}.   \item Minimum variance
unbiased (\textsf{MVU}) mechanism \citep{mvu}: a mechanism that uses optimized probability matrix and output alphabets to map the quantized inputs to outputs. It finds the optimal bin values via a non-linear optimization.
\end{itemize}
For each mechanism, we evaluate the privacy and accuracy using the standard differential privacy (DP) and mean absolute error (MAE) measures. %\hl{For bin assignment, we use grid search to find bin values with optimal performance. \textsf{RQM} uses uniformly-distributed bins}~\citep{rqm}. \hl{MVU uses non-linear optimization to find the optimal bin values}~\citep{mvu}.


\begin{figure*}[htbp]

\begin{subfigure}{0.33\textwidth}
\centering
\includegraphics[width=\linewidth]{uai2024/fig/exp_err_eps0.5.pdf} 
%\caption{Caption1}
%\label{fig:subim1}
\end{subfigure}
\begin{subfigure}{0.33\textwidth}
\includegraphics[width=\linewidth]{uai2024/fig/exp_err_eps1.0.pdf}
%\caption{Caption 2}
%\label{fig:subim2}
\end{subfigure}
\begin{subfigure}{0.33\textwidth}
\includegraphics[width=\linewidth]{uai2024/fig/exp_err_eps1.5.pdf}
%\caption{Caption 2}
%\label{fig:subim2}
\end{subfigure}
\caption{Comparison of mean absolute error under the same privacy on scalar inputs}
\label{fig:exp_err}
\end{figure*}

\begin{figure*}[ht]
\begin{subfigure}{0.24\textwidth}
\centering
\includegraphics[width=\linewidth]{uai2024/fig/l1_vec_err.pdf} 
\caption{}\label{fig:l1_vec_err}
\end{subfigure}
\begin{subfigure}{0.247\textwidth}
\includegraphics[width=\linewidth]{uai2024/fig/l2_vec_err.pdf}\caption{}
\label{fig:l2_vec_err}
\end{subfigure}
\begin{subfigure}{0.25\textwidth}
\centering
\includegraphics[width=\linewidth]{uai2024/fig/dp_sgd_cancer.pdf} \caption{}\label{fig:acc_dp_sgd1}
\end{subfigure}
\begin{subfigure}{0.25\textwidth}
\includegraphics[width=\linewidth]{uai2024/fig/dp_sgd_mnist.pdf}\caption{}\label{fig:acc_dp_sgd2}

\end{subfigure}
\caption{a) Average error of $L_1$ bounded vectors , b) Average error of $L_2$ bounded vectors, c) Training accuracy on breast cancer dataset, d) Training accuracy on MNIST dataset}
\label{fig:acc_dp_sgd}
%\vspace{-0.1cm}
\end{figure*}

\subsection{Scalar input} We first evaluate the performance of our algorithm and baselines on a scalar input. In our experiments, the time and resources needed to find the optimal parameters for \textsf{OPTM} are low. It takes about 300 seconds on a personal computer (with Intel Core i5-10210U CPU and 16 GB RAM) to search over all combinations of hyperparameters (10 optional $\Delta$, 10 optional bin assignments, 100 optional lower/upper bounds on probabilities), and find the parameters which can induce the best performance.

We first consider a scenario where input $x$ follows a uniform distribution over $[-1,1]$. Table \ref{tab:min_avg_err} compares the mean absolute error $\mathbb{E}_{X}(|\mathcal{M}(X)-X|)$ at $\epsilon = 0.5,1.0,1.5$ when $m=4$. As expected, our method \textsf{OPTM} improves privacy-accuracy trade-off, and it has the lowest error compared to baselines. The performance of \textsf{ERM} is also comparable with \textsf{RQM}. It is worth mentioning that we could not find valid hyperparameters for \textsf{RQM} and \textsf{ERM} when privacy loss $\epsilon=0.5$ so we put "N/A" in Table~\ref{tab:min_avg_err}. Figure \ref{fig:exp_err} illustrates mean absolute error with higher granularity for each input value $x$.
The choice of parameters in each mechanism are given in the Appendix~\ref{exp_detail}. We scale the input range of \textsf{MVU} to $[-1,1]$ for a fair comparison and also scale the output alphabets. The results show that \textsf{ERM} can achieve similar and sometimes better utility than \textsf{RQM}. \textsf{OPTM} can achieve lower error in most cases compared to \textsf{RQM} and \textsf{ERM}, which indicates the effectiveness of the optimization scheme. 

We then consider a scenario where input $x$ follows a truncated Gaussian distribution and the distribution is not known in advance. Specifically, the input is first sampled from Gaussian distribution with $\mu = 0.5$, $\sigma = 0.1, 0.2, 0.3$, and then truncated by $[-1, 1]$. Table \ref{tab:min_avg_err_gauss} compares the mean absolute error when $m=4$ and $\epsilon=1$. The results show that \textsf{OPTM} can use asymmetric bins to better capture the pattern of the underlying distribution. Specifically, for each optional bin value, we use samples collected from the same input distribution to estimate the density function, optimize for the parameters with the objective function as stated in Theorem~\ref{thm:extend}, and find the bin values inducing the best performance. In comparison, \textsf{MVU} and \textsf{RQM} use uniformly distributed bins for all inputs, hence inducing higher errors. The performance gain brought by asymmetric bins is higher when the distribution is more concentrated (i.e., with smaller $\sigma$).


\begin{table}[h]
    \caption{Minimal MAE of scalar inputs under uniform distribution. \textsf{OPTM} attains higher accuracy than baselines. N/A means that there are no valid hyperparameters for \textsf{ERM} and \textsf{RQM} when $\epsilon=0.5$.  }
   % \vspace{-0.2cm}
\label{tab:min_avg_err}
    \centering
%\resizebox{0.4\textwidth}{!}{
    \begin{tabular}{cccc}
    \toprule
    $ \mathbb{E}_{X}(|\mathcal{M}(X)-X|)$ & $\epsilon=0.5$ & $\epsilon=1$ & $\epsilon=1.5$ \\
    \midrule
    \textsf{OPTM} & 3.904 & 1.882 & 1.179 \\
    \textsf{MVU} & 3.959 & 1.930 & 1.254 \\
    \textsf{RQM} & N/A & 1.993 & 1.310 \\
    \textsf{ERM} & N/A & 2.216 & 1.304 \\
   \bottomrule
\end{tabular}
%}
%\vspace{-0.3cm}
\end{table}

\begin{table}[h]
    \caption{Minimal MAE of scalar inputs under truncated Gaussian distribution. Our proposed \textsf{OPTM} attains higher accuracy than baselines.}
  %  \vspace{-0.2cm}
\label{tab:min_avg_err_gauss}
    \centering
%\resizebox{0.43\textwidth}{!}{
    \begin{tabular}{cccc}
    \toprule
    $ \mathbb{E}_{X}(|\mathcal{M}(X)-X|)$ & $\sigma=0.1$ & $\sigma=0.2$ & $\sigma=0.3$ \\
    \midrule
    \textsf{OPTM} & 1.778 & 1.836 & 1.972 \\
    \textsf{MVU} & 2.053 & 2.052 & 2.002 \\
    \textsf{RQM} & 2.028 & 2.010 & 2.000 \\
   \bottomrule
\end{tabular}
%}
\vspace{-0.1cm}
\end{table}



\subsection{Vector input} We then compare the error of our mechanism with vector inputs under privacy parameter $\epsilon$. Hyperparameters of each mechanism are given in the Appendix~\ref{exp_detail}. We use bounded random vectors as inputs to simulate the clipped gradients in DP-SGD~\citep{dp-sgd}, i.e., differentially private stochastic gradient descent commonly used for training private machine learning models. Specifically, we generate random vectors with dimension $d=10$. Each coordinate follows uniform distribution in $[-1,1]$, hence producing vectors with bounded $L_1$ norm. 

For each $\epsilon$, we fix bin values $\{B_1,\ldots,B_m\}$ and find the optimal parameters for each mechanism (e.g., selection probability in \textsf{OPTM} and parameter $q$ for \textsf{RQM}). Then, we quantize each coordinate independently.  
 We measure the Euclidean distance between the input and output vector as the error, and repeat this process 10,000 times to calculate the average error (see Figure \ref{fig:l1_vec_err}). 
 
 In another experiment (Figure \ref{fig:l2_vec_err}), we generate random vectors $v\in \mathbb{R}^{100}$ with uniform distribution over ball $||v||_2\leq 1$ (this can be done  through ball point picking~\citep{ball}). We quantize the vector $v$ and measure the error based on Euclidean distance. Again we repeat the process 10,000 times to find the average error. We report both the mean and the standard deviation of the error %(in terms of Euclidean distance) 
 in Figure \ref{fig:l1_vec_err} and  \ref{fig:l2_vec_err}. In both cases, \textsf{OPTM} can achieve lower error compared to \textsf{RQM} and \textsf{MVU}, indicating that our mechanism can effectively reduce the loss when privatizing vector inputs.% like gradients in machine learning applications.


\subsection{DP Stochastic Gradient Descent } We further measure the performance of our mechanisms on downstream machine learning tasks by integrating them into DP-SGD~\citep{dp-sgd} algorithms. Specifically, during each epoch of the Stochastic Gradient Descent (SGD), each coordinate of the gradient vector is clipped by a threshold and then quantized by 
differentially private mechanisms. The parameters of the experiments are given in the Appendix~\ref{exp_detail}. We also record the accuracy when gradients are only clipped, without any privacy protection.  

In our experiments, we first use DP-SGD to train a softmax regression model based on the UCI ML Breast Cancer %Wisconsin (Diagnostic) 
dataset~\citep{cancer} with 569 samples. We record the accuracy on the training set after training on each batch of data. Results are shown in Figure~\ref{fig:acc_dp_sgd1}. We also train a softmax regression model based on the MNIST dataset~\citep{mnist} with 60,000 images and record the training accuracy. Results are shown in Figure~\ref{fig:acc_dp_sgd2}.  
On the Breast Cancer dataset, \textsf{OPTM} achieves a better convergence rate than \textsf{RQM} and \textsf{MVU}, and achieves very close accuracy compared with the non-private scheme. 
On MNIST dataset, \textsf{OPTM} has the same performance as \textsf{MVU} and higher accuracy compared to \textsf{RQM}. As errors brought by DP mechanisms can slow down the convergence process, our mechanism can achieve a better convergence rate compared to baselines.

 

