\section{Experiments}
\label{sec-exp}

This section presents the empirical evaluation of \textsc{mLDP-KDE} on real-world and synthetic data sets.

\subsection{Experimental Setup}

\paragraph{Data Sets}
We employ the following four publicly available real-world data sets and one synthetic data set for performance evaluation.
\begin{itemize}
  \item \textbf{CodRNA} \citep{UzilovKM06} is a collection of RNA genomic sequences.
  \item \textbf{CovType} \citep{misc_covertype_31} comprises different cartographic features of areas located in the Roosevelt National Forest.
  \item \textbf{RCV1} \citep{LewisYRL04} is an archive of categorized newswire stories from Reuters. We embed all the documents into a $100$-dimensional Euclidean space.
  \item \textbf{Yelp}\footnote{\url{https://www.yelp.com/dataset}} includes reviews from users on Yelp. We represent users as $100$-dimensional vectors, which are derived from a user-business rating matrix using NMF.
  \item \textbf{SYN}, created by \texttt{make\_blobs} in the scikit-learn library,\footnote{\url{https://scikit-learn.org/}} comprises isotropic Gaussian blobs. We specify $10$ centroids, each randomly drawn in the range $[-2, 2]^{m}$, and the standard deviation of each blob to $0.01$. 
  We vary the number of data points ($n$ from $10^4$ to $10^6$) and the dimensions ($m$ from $5$ to $50$) to test scalability. By default, we set $n = 10^5$ and $m = 50$.
\end{itemize}
These data sets are commonly used to benchmark (non-private or private) KDE and clustering methods in the existing literature \citep{ColemanS21, WagnerNM23}.
We also note that they are not tailored to local privacy settings and that specialized data sets for KDE with local privacy are still absent.
Table~\ref{tab:datasets} presents the statistics of these data sets where $\omega$ is the bandwidth of the $l_2$-LSH kernel, and $r$ is the calibration radius in the privacy calculation.
According to \citep{ColemanS20, WagnerNM23}, we set $\omega$ based on the average distance $\overline{d}$ between two points in the data set, which is adjusted so that the average kernel density is around $0.1$.
Following the common practice of mLDP \citep{FernandesKM21}, the value of $r$ is determined by computing the average distance $\tilde{d}$ from a point to its $100$-th nearest neighbor and rounding it to two significant figures.
We also provide additional experiments to evaluate how the value of $r$ affects the performance of \textsc{mLDP-KDE} in Appendix~\ref{appendix-subsec-radius}.

\input{tables/datasets}

\paragraph{Algorithms and Implementations}
We compare \textsc{mLDP-KDE} with the following five algorithms:
\textsc{RACE} \citep{ColemanS20} is a non-private sketch method for the KDE problem;
\textsc{DM} \citep{DuchiJW13}, \textsc{PM} \citep{WangXYZHSS019}, and \textsc{SW} \citep{LiWLLS20} are LDP methods for numerical data publication and distribution estimation;
\textsc{GI} \citep{AndresBCP13, Alvim0PP18} is a method to preserve location privacy in two-dimensional Euclidean space, which is extended to support higher-dimensional Euclidean distance by \citet{FernandesKM21}.
Since \textsc{SW} only supports one-dimensional data, we extend it to multidimensional data by independently perturbing each dimension of a point using a privacy budget of $\frac{\varepsilon}{m}$.
For \textsc{DM}, \textsc{PM}, \textsc{SW}, and \textsc{GI}, which are not customized for KDE, we employ the following adaptation:
Each client perturbs their data points and sends them to the server; the server computes the KDE for a query point using these perturbed points.
We refer to the above methods with this adapted procedure as \textbf{\textsc{DM-KDE}}, \textbf{\textsc{PM-KDE}}, \textbf{\textsc{SW-KDE}}, and \textbf{\textsc{GI-KDE}}, respectively.
\textsc{DP-KDE} methods \citep{AldaR17, ColemanS21, WagnerNM23} are not compared since they are limited to centralized settings.

All these algorithms were implemented in Python 3.
All methods were conducted on a desktop with an Intel\textsuperscript{\textregistered} Core\textsuperscript{\texttrademark} i7-10700K CPU @3.0GHz and 32GB RAM. Each method was run on a single thread in each experiment.
Our code and data are publicly available at \url{https://github.com/yz2022/mldp-kde}.

\begin{figure*}[t]
  \centering
  \includegraphics[width=0.64\textwidth]{figures/legend_eps.pdf}
  \\
  \vspace{1mm}
  \includegraphics[width=0.194\textwidth]{figures/CodRNA_MSE_epsilon.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/CovType_MSE_epsilon.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/RCV1_MSE_epsilon.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/Yelp_MSE_epsilon.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/SYN_MSE_epsilon.pdf}
  \\
  \caption{MSEs for KDE under LDP/mLDP with varying privacy budget $\varepsilon \in \{1, 2.5, 5, \cdots, 20\}$.}
  \label{fig-eps}
\end{figure*}

\begin{figure*}[t]
  \centering
  \includegraphics[width=0.58\textwidth]{figures/legend_size.pdf}
  \\
  \vspace{1mm}
  \includegraphics[width=0.194\textwidth]{figures/CodRNA_MSE_LxR.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/CovType_MSE_LxR.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/RCV1_MSE_LxR.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/Yelp_MSE_LxR.pdf}
  \hfill
  \includegraphics[width=0.194\textwidth]{figures/SYN_MSE_LxR.pdf}
  \\
  \caption{MSEs of \textsc{RACE} and \textsc{mLDP-KDE} for privacy budgets $\varepsilon = 1, 5, 20$ with varying sketch size $L \times R$.}
  \label{fig-size}
\end{figure*}

\begin{figure*}[t]
  \centering
  \includegraphics[width=0.68\textwidth]{figures/legend_eps.pdf}
  \\
  \vspace{1mm}
  \includegraphics[width=0.2\textwidth]{figures/MSE_n_e_1.pdf}
  \hspace{1em}
  \includegraphics[width=0.2\textwidth]{figures/MSE_n_e_5.pdf}
  \hspace{1em}
  \includegraphics[width=0.2\textwidth]{figures/MSE_n_e_20.pdf}
  \\
  \includegraphics[width=0.2\textwidth]{figures/MSE_m_e1.pdf}
  \hspace{1em}
  \includegraphics[width=0.2\textwidth]{figures/MSE_m_e5.pdf}
  \hspace{1em}
  \includegraphics[width=0.2\textwidth]{figures/MSE_m_e20.pdf}
  \\
  \caption{MSEs for KDE on the SYN data set with varying data set size $n$ from $10^4$ to $10^6$ and dimension $m$ from $5$ to $50$.}
  \label{fig-mn-mse}
\end{figure*}

\paragraph{Performance Measures}
For each data set, we randomly choose 100 points to form the query set $\mathcal{Q}$ and use the rest as the data set $\mathcal{D}$.
We evaluate the KDE quality of each method by the mean squared error (MSE) across all queries in $\mathcal{Q}$, that is, $\mathrm{MSE} = \frac{1}{|\mathcal{Q}|} \sum_{\bm{q} \in \mathcal{Q}} (\widehat{\mathrm{KDE}}_\mathcal{D}(\bm{q}) - \mathrm{KDE}_\mathcal{D}(\bm{q}))^2$.
Given the stochastic nature of these methods, we run each experiment ten times with distinct (yet fixed) seeds and report the average result for each measure. 

\paragraph{Parameter Settings}
The values of various parameters were set as follows:
(1) privacy budget $\varepsilon \in \{1, 2.5, 5, \cdots, 20\}$;
(2) sketch height $L$ from $1$ to $1,000$ and width $R$ from $2$ to $100$;
(3) bandwidth $\omega$ and radius $r$ on each data set according to Table~\ref{tab:datasets};
(4) confidence parameter $\eta = 0.1$ and group parameter $L' = 1$.
To decide the default values of $L$ and $R$, we run \textsc{mLDP-KDE} with different sketch sizes and use the ones with the lowest MSE for each privacy budget $\varepsilon$.

\subsection{Experimental Results}

\paragraph{Utility vs. Privacy}
We evaluate the performance of different algorithms in terms of the balance between the level of privacy and the quality of KDE.
Figure~\ref{fig-eps} shows the MSE of the KDE query results returned by each algorithm, with the privacy budget $\varepsilon$ ranging from $1$ to $20$.
For \textsc{mLDP-KDE}, we report the lowest MSE across different sketch sizes for each $\varepsilon$ value on every data set.
For \textsc{RACE}, which does not involve data perturbation, we fix $L = 1,000$ and $R = 100$ and represent its result as a horizontal line in each plot.
The difference between \textsc{mLDP-KDE} and \textsc{RACE} highlights the impact of the LSH+GRR mechanism on the quality of KDE.

In general, we observe that all LDP and mLDP algorithms exhibit a reduction in MSEs as the privacy budget $\varepsilon$ increases, indicating more accurate KDEs.
\textsc{mLDP-KDE} significantly and consistently outperforms all baselines in terms of privacy-utility trade-offs.
A key factor is that \textsc{mLDP-KDE} provides LDP for each point w.r.t.~other points within a distance of $r$, which better preserves the original data distribution and thus produces approximate KDE results of higher quality than the LDP baselines that should provide much more stringent LDP guarantee w.r.t.~all possible points.
The fact that \textsc{mLDP-KDE} significantly outperforms \textsc{GI-KDE}, which provides the same mLDP guarantee, indicates that adding noise to the hash values rather than the original data greatly reduces the privacy budget required to achieve the same level of utility.
We find that the KDE quality of \textsc{GI-KDE} is highly dependent on the dimensionality $m$.
This limitation comes from the exponential growth of the expected perturbation distance with increasing $m$, which makes the perturbed point further from the original point and thus spoils the data distribution.
For higher-dimensional data sets such as RCV1 and Yelp, the MSEs of \textsc{GI-KDE} decrease more slowly, eventually resulting in estimates that are no better than those of the LDP methods.

\paragraph{Utility vs. Sketch Size}
We test the effect of the sketch size ($L \times R$) on the KDE quality of \textsc{RACE} and \textsc{mLDP-KDE}, sampling $1,000$ data points per data set for sketch construction and performing the same $100$ KDE queries as in previous experiments.
Figure~\ref{fig-size} illustrates the MSEs of \textsc{RACE} and \textsc{mLDP-KDE} with privacy budgets $\varepsilon = 1, 5, 20$ with sketch sizes $L \times R$ from $10^1$ to $10^4$ (to $10^6$ on the Yelp and SYN data sets).
As $L \times R$ increases, the MSE decreases significantly for both \textsc{mLDP-KDE} and \textsc{RACE} across all data sets.
However, \textsc{RACE} and \textsc{mLDP-KDE} also show some differences:
For \textsc{RACE}, the MSE finally stabilizes at a low level with increasing sketch sizes;
but for \textsc{mLDP-KDE}, the MSE rebounds when the sketch size is too large, as the variances from GRR and the correction process outweigh the benefits of using more estimators and wider hash ranges.
Furthermore, with small sketch sizes, the MSEs of both algorithms are comparable due to the correction process during KDE query processing.
Finally, for \textsc{mLDP-KDE}, using a larger sketch size leads to more benefits when the privacy budget is higher.

\paragraph{Scalability Test}
We test the scalability of all methods on the SYN data sets, illustrating the MSE results for privacy budgets $\varepsilon = 1, 5, 20$ in Figure~\ref{fig-mn-mse}.
The MSEs of all methods are stable regardless of $n$, indicating that the KDE quality is insensitive to the size of the data set.
This confirms the results of Lemma~\ref{lm-unbiasedness} and Theorem~\ref{thm-approx}, which indicate that the variance of \textsc{mLDP-KDE} is independent of $n$.
With varying dimensionality $m$, \textsc{mLDP-KDE} exhibits slightly higher MSEs than \textsc{RACE} but outperforms other baselines in all cases and remains stable of different dimensions, indicating that \textsc{mLDP-KDE} can scale well to large-scale high-dimensional data sets.
\textsc{GI-KDE} and \textsc{PM-KDE} can achieve a relatively low MSE for very small $m$ but cannot provide reasonable KDE results when $m > 5$.

Extended experimental results are omitted in the main paper due to space limits and will be provided in Appendix~\ref{appendix-experiments}.
