everyone
since 01 Mar 2024">EveryoneRevisionsBibTeXCC BY 4.0
We thank the Action Editor and all the reviewers for their time and for their valuable feedback that significantly enhanced our submitted manuscript. Below, we provide a point-by-point response to each comment provided in the decision response. Due to a lack of space in the changes section of the camera-ready submission form, we only include a gist of the comment made by the Action Editor. However, in our responses, when necessary, we include a comment about the exact change or page number of the change. Below, we summarize the significant changes to the manuscript,
Comment 1: Utility bounds in terms of $(\epsilon, \delta)$-DP.
Response: We thank the Action Editor for the suggestion. We have expanded the remark after Theorem 8 (on page 12) to discuss how to obtain this utility-privacy tradeoff as noted next, “Specifically, suppose that we wish to re-write the conditions stated in Theorems 7 and 8 in terms of $(\epsilon, \delta)$-DP guarantees. We first pick $\mu$, which can be easily converted to $(\epsilon, \delta)$ DP guarantees according to Lemma 2. Then, by assuming that $\mu_s = \mu_p \cdot m_s$, where $m_s$ is a known constant factor, we obtain from Theorem 6 that $\mu = \mu_p \sqrt{s(1 + 2 \cdot m_s^2)}$. We can thus solve $\mu_p$ and $\mu_s$ as: $\mu_p = \mu/\sqrt{s(1+2 \cdot m_s^2)}$ and $\mu_s = \mu_p \cdot m_s$. These values can then be directly plugged into Theorems 7 and 8 to obtain the accuracy guarantees.” We thank the editor again for this comment that improves the clarity of our results.
Furthermore, $\sigma_{\varepsilon}$ denotes the standard deviation of the additive error (denoted by the vector variable $\mathbf{\varepsilon}$) in the system model section on page 4. On page 11, we refer to this notation $\sigma_{\varepsilon}$. To avoid confusion, we thus add the following remark for clarity, “Recall that $\sigma_{\varepsilon}$ is the standard deviation of the additive error in the system model presented in Section 2.”
Comment 2: Reasoning behind choosing GDP over Renyi DP and other DP composition mechanisms.
Response: We thank the action editor for this insightful comment. Accordingly, on pages 9-10 of our modified manuscript, we have included an additional comment about the work in [1]. More specifically, we augment our existing response with the following sentence, “Similarly, for non-adaptive privacy mechanisms, GDP has been shown to be the most optimal mechanism (see Theorem 8 in [1]) such that the optimal noise variance can be appropriately identified to obtain a given $(\epsilon, \delta)$ DP guarantee.” We also shortened our description of the comparison result from Dong et al. (2019).
[1] Balle, Borja, and Yu-Xiang Wang. "Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising." International Conference on Machine Learning. PMLR, 2018.
Comment 3: In the privacy analysis of Alg. 3 (Theorem 6), is the DP cost of steps 4-5 included?
Response: We thank the action editor for their question. We note that our analysis does already include the DP cost of steps 4-5 since we have (s) privacy mechanisms in Theorem 6 for the \mu_p parameter (Here we run the algorithm until s basis values are selected).
Comment 4: Proofreading of the document.
Response: Thank you for the suggestions. As requested, we have made the changes mentioned in Comment 4. Furthermore, we have carefully proofread and modified the manuscript.
Comment 5: Discrepancy in the notation of the data matrix X.
Response: We thank the action editor for their question. We indeed intend that the matrix $\mathbf{X}$ has $n$ rows and $p$ columns, where $n$ indicates the number of samples/clients and $p$ indicates the number of features. Each feature is thus a column vector $n$-element long and denoted by $X_i$, where $i \in [p]$, and thus $X = {X_1, …, X_p}$. We also use lower-case $x_i$ to denote the sample $i$, which corresponds to the $i$-th row of X. However, we do realize that there was some confusion when we said on page 4 of the paper that $(X, y) = (x_i, y_i)_{i \in {1, …, n}}$. To avoid such confusion, we have revised this part (on page 4) as “We now consider $n$ input-output training pairs represented by $(x_i, y_i), i \in {1, ..., n}$ where each pair belongs to a single distinct client. We stack the row vectors $x_i$ vertically together to form an $n \times p $ matrix $X$. Similarly, we stack $y_i$ into an $n \times 1$ vector $y$.”
Comment 6: Add names of the datasets either to figures or captions.
Response: Thank you for suggesting this change. We have added the names of the datasets to the figure’s captions.