\subsection{Theoretical Analysis}
\label{sec-theory}

Next, we analyze the privacy guarantee, approximation bound, and complexity of \textsc{mLDP-KDE}.
Note that all proofs are omitted from the main paper due to space limitations and are provided in Appendix~\ref{appendix-proofs}.

\paragraph{Privacy Analysis}
We start by defining the distance $d_{\mathrm{hash}}: [1, R]^{L} \times [1, R]^{L} \mapsto [0, L]$ for two sequences of hash values as the count of different positions.
Obviously, $d_{\mathrm{hash}}(\cdot, \cdot)$ is metric because it is nonnegative, symmetric, and satisfies the triangle inequality.
The following lemma shows that the GRR mechanism on any sequence of hash values provides mLDP on $d_{\mathrm{hash}}(\cdot, \cdot)$.
\begin{lemma}\label{lm-grr-mldp}
  The GRR mechanism $\mathcal{M}_{\mathrm{GRR}}$ with a privacy parameter $\gamma > 0$ provides $(\gamma d_{\mathrm{hash}}, 0)$-mLDP on a sequence of $L$ integers in the range of $[1, R]$.
\end{lemma}

For each $\bm{x} \in \mathcal{D}$, the LSH+GRR mechanism to produce $\widehat{H}(\bm{x})$ in Algorithm~\ref{alg-1} is also $(\gamma d_{\mathrm{hash}}, 0)$-mLDP because $H(\bm{x})$ must be an input for $\mathcal{M}_{\mathrm{GRR}}$ in Lemma~\ref{lm-grr-mldp}. Formally, for any $\bm{x}, \bm{x}' \in \mathbb{R}^{m}$ and $\bm{y} \in [1, R]^{L}$,
\begin{equation}\label{eq-log-loss}
  \mathcal{L}_{\bm{x}, \bm{x}'} = \ln \big( \tfrac{\Pr[\widehat{H}(\bm{x}) = \bm{y}]}{\Pr[\widehat{H}(\bm{x}') = \bm{y}]} \big) \leq \gamma d_{\mathrm{hash}}\big(H(\bm{x}), H(\bm{x}')\big).
\end{equation}
Then, we define a random variable $X$ for the distribution of $d_{\mathrm{hash}}(H(\bm{x}), H(\bm{x}'))$ over all possible LSH functions in the $2$-stable LSH scheme, and show that $X$ is binomial.
\begin{lemma}\label{lm-EXP-bound}
  Define a random variable $X = d_{\mathrm{hash}}(H(\bm{x}),$ $H(\bm{x}'))$, where the $L$ LSH functions are drawn independently from the $2$-stable LSH scheme. Then, $X$ follows a binomial distribution $\mathcal{B}(L, \tfrac{R - 1}{R} \cdot (1 - k(\bm{x},\bm{x}')))$.
\end{lemma}

According to Eq.~\ref{eq-log-loss} and Lemma~\ref{lm-EXP-bound}, the LSH+GRR mechanism is shown to provide mLDP by applying the Chernoff bound \citep{Chernoff52}.
\begin{theorem}\label{thm-mldp}
  The LSH+GRR mechanism in Algorithm~\ref{alg-1} provides $(d_{\chi}, \eta)$-mLDP, where $d_{\chi}(\bm{x},$ $\bm{x}') = \frac{\gamma cL(R - 1)}{\omega R} \cdot d(\bm{x}, \bm{x}') + \gamma \sqrt{\frac{L \ln(1/\eta)}{2}}$ for any $c \geq 0.8$ and $\eta \in (0, 1)$.
\end{theorem}

Based on Theorem~\ref{thm-mldp}, we indicate how to decide the value of $\gamma$ in Algorithm~\ref{alg-1} w.r.t.~a privacy budget $\varepsilon > 0$.
Since the level of privacy in mLDP varies with $d(\bm{x}, \bm{x}')$, we should calibrate it with a radius $r > 0$.
That is, we require the privacy level to be at most $\varepsilon$ for any $d(\bm{x}, \bm{x}') \leq r$.
As such, it guarantees that a point $\bm{x}$ is $\varepsilon$-indistinguishable from any point $\bm{x}'$ within a ball of radius $r$ centered at $\bm{x}$.
To achieve this, we need to ensure that $d_{\chi}(\bm{x}, \bm{x}') \leq \varepsilon$ when $d(\bm{x}, \bm{x}') \leq r$.
According to Theorem~\ref{thm-mldp}, the value of $\gamma$ in Algorithm~\ref{alg-1} should be
\begin{equation}\label{eq-gamma}
  \gamma \leq \varepsilon/\big(\tfrac{0.8 r L(R - 1)}{\omega R} + \textstyle \sqrt{\frac{L \ln(1/\eta)}{2}}\big).
\end{equation}
By applying the Chernoff bound with the Kullback–Leibler (KL) divergence, we obtain another mLDP guarantee for the LSH+GRR mechanism.
\begin{corollary}\label{col-mldp}
  Let $p = \frac{R - 1}{R} \cdot (1 - k(\bm{x}, \bm{x}'))$. For any $0 < s < 1 - p$, the LSH+GRR mechanism provides $(d_{\chi}, \eta)$-mLDP, where $d_{\chi}(\bm{x}, \bm{x}') = \gamma L \big( \frac{c (R-1)}{\omega R} \cdot d(\bm{x}, \bm{x}') + s \big)$, $\eta = \exp\big(-L \cdot D_{\mathrm{KL}}(p + s \parallel p)\big)$, and $c \geq 0.8$.
\end{corollary}

Fixing $d_{\chi}(\bm{x}, \bm{x}') = r$, we can solve the two equations for $d_{\chi}(\bm{x}, \bm{x}')$ and $\eta$ in Corollary~\ref{col-mldp} using Newton's method to approximate the values of $s$ and $\gamma$.
In practice, we compute the two $\gamma$'s according to Eq.~\ref{eq-gamma} and Corollary~\ref{col-mldp} and use the larger one in Algorithm~\ref{alg-1}.

Finally, we present a worst-case privacy guarantee of the LSH+GRR mechanism that does not depend on the randomness of the $2$-stable LSH scheme.
\begin{corollary}\label{thm-ldp}
  The LSH+GRR mechanism provides $\gamma L$-LDP.
\end{corollary}

\paragraph{Approximation Analysis}
According to \citep{ColemanS20}, each initial counter provided by the sketch is an unbiased estimator for LSH kernels.
However, this unbiasedness is no longer retained after rehashing and performing the GRR mechanism since the data distribution is changed.
To provide an unbiased KDE, we need to analyze how the rehashing scheme and the GRR mechanism affect the collision probability of a query point $\bm{q}$ and any data point $\bm{x}$, as well as the distribution of each counter, and try to recover the original estimator, as outlined in Algorithm~\ref{alg-2}.
Next, we show that the estimator is unbiased and provides an upper bound of its variance.
\begin{lemma}\label{lm-unbiasedness}
  For the estimator $\widehat{\mathcal{S}}_\mathcal{D}[i, h_i(\bm{q})]$ in Algorithm~\ref{alg-2}, it holds that $\mathbb{E}\big[\widehat{\mathcal{S}}_\mathcal{D}[i, h_i(\bm{q})] \big] = n \mathrm{KDE}_{\mathcal{D}}(\bm{q})$ and
  \begin{multline} \label{estimator-var}
      \mathrm{Var}\big[ \widehat{\mathcal{S}}_\mathcal{D}[i, h_i(\bm{q})] \big] \leq \big(\tfrac{e^\gamma + R - 1}{e^\gamma - 1}\big)^2 \big(\tfrac{R}{R - 1}\big)^2\\ 
      \big(\textstyle \sqrt{\tfrac{e^{\gamma}}{e^{\gamma} + R - 1} -\tfrac{1}{R}} \widetilde{K}(\bm{q}) + \tfrac{1}{\sqrt{R}} \big)^2,
  \end{multline}
  where $\widetilde{K}(\bm{q}) = \sum_{\bm{x} \in \mathcal{D}} \sqrt{k(\bm{x}, \bm{q})}$.
\end{lemma}

By applying Chebyshev's inequality and the Chernoff bound to the output $\widehat{\mathrm{KDE}}_{\mathcal{D}}(\bm{q})$ of Algorithm~\ref{alg-2}, which uses a common median-of-means technique for estimation, we obtain the following theorem for its approximation bound.
\begin{theorem}\label{thm-approx}
  For the sketch $\mathcal{S}_{\mathcal{D}}$ constructed by Algorithm~\ref{alg-1} with $L = O\big((\frac{e^\gamma + R - 1}{e^\gamma - 1})^2 \cdot \frac{\log(1/\eta)}{\alpha^2}\big)$ independent rows, the output $\widehat{\mathrm{KDE}}_{\mathcal{D}}(\bm{q})$ of Algorithm~\ref{alg-2} is guaranteed to be an $(\alpha, \eta)$-approximation of $\mathrm{KDE}_{\mathcal{D}}(\bm{q})$.
\end{theorem}

We note that the restrictions on the values of $\gamma$ and $L$ in Eq.~\ref{eq-gamma} for the privacy guarantee and in Theorem~\ref{thm-approx} for the approximation bound may not be satisfiable at the same time when the privacy parameter $\varepsilon$ is too small.
This is because Eq.~\ref{eq-gamma} restricts the upper bound of $\gamma$, but, in the meantime, Theorem~\ref{thm-approx} limits its lower bound.
Consequently, the required ranges of $\gamma$ by Eq.~\ref{eq-gamma} and Theorem~\ref{thm-approx} may not overlap each other.
To eliminate the circular dependence on $\gamma$ and $L$ and thus reconcile Eq.~\ref{eq-gamma} and Theorem~\ref{thm-approx}, we further establish the following approximation bound.
\begin{theorem}\label{thm-circdep}
For the privacy parameter $\varepsilon = O(\frac{\log (1/\eta )}{\alpha ^ {2}})$ and the sketch parameters $L = O(\frac{\log (1/\eta)}{\alpha^{2}})$ and $R = O(1)$, the output $\widehat{\mathrm{KDE}}_{\mathcal{D}}(\bm{q})$ of Algorithm~\ref{alg-2} is guaranteed to be an $(\alpha, \eta)$-approximation of $\mathrm{KDE}_{\mathcal{D}}(\bm{q})$.
\end{theorem}

Theorem~\ref{thm-circdep} implies that the approximation bound of \textsc{mLDP-KDE} might not hold when $\varepsilon = o(\frac{\log(1/\eta)}{\alpha^{2}})$.
In practice, we adopt a privacy-first strategy that determines the values of $\gamma$ and $L$ based on Eq.~\ref{eq-gamma} or Corollary~\ref{col-mldp} to ensure the satisfaction of mLDP, albeit this may result in a smaller $L$ than required by the approximation bound in Theorem~\ref{thm-approx}. This strategy achieves reasonable empirical performance, as the practical number of rows needed to estimate KDEs with small errors is much lower than the theoretical upper bound due to the conservatism of probability inequalities.

\paragraph{Complexity Analysis}
In Algorithm~\ref{alg-1}, the server generates LSH parameters in $O(mL)$ time. Each user then computes and perturbs the hash values in $O(mL)$ time, followed by the server aggregating these sequences to build $\mathcal{S}_{\mathcal{D}}$ in $O(nL)$ time.
Therefore, the server and each user take $O\big((m + n)L\big)$ (or simply $O(nL)$ as $n \gg m$) and $O(mL)$ time to build $\mathcal{S}_{\mathcal{D}}$, respectively. The total communication cost is $O(mnL)$.
The spaces used to run Algorithm~\ref{alg-1} are $O(nL)$ and $O(mL)$, and the size of $\mathcal{S}_{\mathcal{D}}$ is $O(LR)$.

On receiving a query $\bm{q}$, Algorithm~\ref{alg-2} spends $O(mL)$ time to compute $\widehat{\mathrm{KDE}}_{\mathcal{D}}(\bm{q})$.
The sketch size and query time in the \textsc{mLDP-KDE} framework are both sublinear w.r.t.~$n$ because $L$ and $R$ are independent of $n$.
For comparison, a non-sketch-based KDE method with LDP or mLDP takes shorter $O(n)$ and $O(m)$ pre-processing times on the server and user sides and has a lower communication cost of $O(nm)$ in the local computation model.
However, the time and space complexities of processing each query without the sketch both increase significantly to $O(nm)$.
