\documentclass[12pt]{article} % use larger type; default would be 10pt

\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)

 

%%% PAGE DIMENSIONS
\usepackage[margin=0.8in]{geometry} % to change the page dimensions
\geometry{letterpaper} % or a4paper (Europe) or a5paper or....

\usepackage{graphicx} % support the \includegraphics command and options
% \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent

%%% PACKAGES
\usepackage{booktabs} % for much better looking tables
\usepackage{array} % for better arrays (eg matrices) in maths
\usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.)
\usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim
\usepackage{mathrsfs}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amsmath,amsfonts,amssymb,esint}
\usepackage{graphics}
\usepackage{enumerate}
\usepackage{mathtools}
\usepackage{xfrac}
\usepackage{bbm}
\usepackage{subcaption}
\usepackage{times}


% \usepackage[hashEnumerators,smartEllipses]{markdown}

%%% For Hyperlinks inside the PDF file
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
 


%%% EQUATION Numbering Starts Fresh at each Section
\def\theequation{\thesection.\arabic{equation}}
\numberwithin{equation}{section}


%%% Theorems etc.
\newtheorem{theorem}{Theorem}[section]
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition}
\theoremstyle{definition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{question}{Question}




\newcommand{\norm}[1]{\left\|#1\right\|}
\newcommand{\abs}[1]{\left|#1\right|}
\DeclarePairedDelimiter{\floor}{\lfloor}{\rfloor}
\newcommand{\average}[1]{\left\langle#1\right\rangle}
\newcommand{\md}[1]{\ (\text{mod}\ #1)}
\newcommand{\res}{\text{Res}}
\newcommand{\jsymb}[2]{\left(\frac{#1}{#2}\right)}
\newcommand*{\supp}{\ensuremath{\mathrm{supp\,}}}
\newcommand*{\Id}{\ensuremath{\mathrm{Id}}}
\newcommand*{\Span}{\ensuremath{\mathrm{span}}}
\renewcommand*{\div}{\ensuremath{\mathrm{div\,}}}
\newcommand*{\op}{\ensuremath{\mathrm{op\,}}}
\newcommand*{\N}{\ensuremath{\mathbb{N}}}
\newcommand*{\Q}{\ensuremath{\mathbb{Q}}}
\newcommand*{\T}{\ensuremath{\mathbb{T}}}
\newcommand*{\Z}{\ensuremath{\mathbb{Z}}}
\newcommand*{\R}{\ensuremath{\mathbb{R}}}
\newcommand*{\CC}{\ensuremath{\mathbb{C}}}
\newcommand*{\tr}{\ensuremath{\mathrm{tr\,}}}
\newcommand{\eps}{\varepsilon}
\newcommand{\OO}{\mathcal O}
\renewcommand{\tt}{\tilde{t}}
\newcommand{\tx}{\tilde{x}}
\newcommand{\ZZ}{\mathcal Z}
\newcommand{\EE}{\mathcal E}
\newcommand{\HH}{\mathcal H}
\newcommand{\PP}{\mathcal P}
\newcommand{\RR}{\mathring R}
\newcommand{\Rbar}{\mathring {\overline R}}
\newcommand{\RSZ}{\mathcal R}
\renewcommand*{\Re}{\ensuremath{\mathrm{Re\,}}}
\renewcommand*{\tilde}{\widetilde}
\renewcommand*{\hat}{\widehat}
\newcommand*{\curl}{\ensuremath{\mathrm{curl\,}}}
\newcommand*{\ad}{\ensuremath{\mathrm{ad\,}}}
\newcommand*{\pa}{\partial}
\newcommand*{\ve}{\varepsilon}
\newcommand{\red}[1]{\textcolor{red}{#1}}
\newcommand{\blu}[1]{\textcolor{blue}{#1}}
\newcommand{\vlad}[1]{\textcolor{magenta}{#1}}
\DeclareMathOperator{\sign}{sign}
\DeclareMathOperator{\dist}{dist}
\DeclareMathOperator{\Ric}{Ric}
\newcommand{\WW}{\ensuremath{\mathbb{W}}}
\newcommand{\Proj}{\ensuremath{\mathbb{P}}}\newcommand{\les}{\lesssim}
\newcommand*\diff{\mathop{}\!\mathrm{d}}
\newcommand*\Diff[1]{\mathop{}\!\mathrm{d^#1}}
\renewcommand{\d}{{\bf d}}
\renewcommand{\u}{{\bf u}}
\newcommand{\x}{ {\bf x}}
\newcommand{\J}{ {\bf J}}
\renewcommand{\S}{\mathbb{S}}
\newcommand{\n}{ {\bf n}}
\newcommand{\f}{ {\bf f}}
\newcommand{\bbeta}{{\pmb\beta}}
\newcommand{\bW}{{\bf W}}
\newcommand{\bJ}{{\bf J}}
\newcommand{\rd}{{\rm d}}
\newcommand{\dt}{\rd t}
\newcommand{\cD}{\mathcal{D}}
\newcommand{\cL}{\mathcal{L}}
\newcommand{\cE}{\mathcal{E}}
\everymath{\displaystyle}
\DeclareMathOperator{\argmin}{argmin}
\DeclareMathOperator{\GW}{GW}
\DeclareMathOperator{\IGW}{IGW}
\DeclareMathOperator{\diag}{diag}
\title{\bf Rebuttal for r2SGLD ICML 2024}
\author{ Hengrong Du
\thanks{Department of Mathematics, Vanderbilt University.
}
\date{\today} }

\usepackage[nottoc,notlot,notlof]{tocbibind}
\allowdisplaybreaks

\newcommand{\haoyang}[1]{{\color{red}[Haoyang: #1]}} 
\newcommand{\wei}[1]{{\color{blue}[wei: #1]}} 

\begin{document}
\maketitle

\textbf{Reviewer gbJB}



Thanks for your valuable comments and suggestions.

\textbf{Unclear applications}


Access to diverse data resources is paramount for enhancing model efficacy. Yet, these resources are frequently privately stored across various platforms such as mobile devices, hospitals, and data centers, constraining the capacity to develop robust models. Our Federated-Averaging Langevin Dynamics (FA-LD) algorithm introduces a pioneering sampling framework, mitigating privacy and communication barriers while aggregating more valuable training data. Notably, our contributions have catalyzed many significant subsequent research endeavors, such as [1,2], which show promising prospects for future applications. 


[1] Federated Averaging Langevin Dynamics: Toward a unified theory and new algorithms. AISTAT'23

[2] Federated Sampling with Langevin Algorithm under Isoperimetry. TMLR'24.



\textbf{Reviewer mPaF}

Thanks for your valuable comments and suggestions.


\textbf{Novelty of the proof}

We built the first sampling framework in federated learning with distributed clients and have paved the way for several important follow-up works, such as [1,2]. Extending sampling to distributed clients is non-trivial due to a key challenge: $\theta_k$ isn't accessible in most of iterations when $k \text{ mod } K \neq 0$, where $K\in \mathrm{N}^+$ is the number of local steps. To tackle this issue, the major novelty of our work lies in the dominated contraction property as provided in Lemma 4.4, which supports a discrepancy between $\theta^c$ in the local client and $\theta$ in the center. This lemma seamlessly integrates into our sampling framework, facilitating the natural adoption of coupling techniques for convergence analysis across varying local step counts.

Given the significant communication bottleneck in federated learning, our theorem 4.7 highlights that setting the vanilla local step too small (e.g., $K=1$) or too large is inefficient in terms of communication. Our results demonstrate that the optimal local step $K$ should be approximately $\Omega(\sqrt{T_{\epsilon}})$, where $T_{\epsilon}$ represents the iterations required to attain $\epsilon$ accuracy in W2. These theoretical insights align closely with our empirical observations, as illustrated in Figure 1a.



[1] Federated Averaging Langevin Dynamics: Toward a unified theory and new algorithms. AISTAT'23

[2] Federated Sampling with Langevin Algorithm under Isoperimetry. TMLR'24.


\textbf{Reviewer 5TzZ}

\textbf{My major concern is the convergence rates seem to be not quite strong}

Unlike the concise analytical results in federated SGD, the inclusion of Brownian motion in federated sampling includes many non-trivial terms in the analysis. For example, a similar result of Theorem 1 in [1] shows that
\begin{align*}
    \mathrm{W_2^2(\mu_k, \pi)\lesssim (1-\gamma m/8)^k I(\mu_0) + \frac{\gamma^e}{b} \frac{d}{1+d/b} + \gamma V_{\pi} + \frac{\gamma^2 (1-p_c)}{p_c^2}\{H+p_c V_{\star} + \frac{d}{b}\}+ \frac{\gamma (1-\pi)(1-b^{-1})d}{p_c}.}
\end{align*}

The additional terms present a challenge in achieving a precise analysis of the optimal local step. As pioneers in sampling within federated learning, our work has inspired numerous subsequent studies, as evidenced by [1,2]. We prioritize a unified and clear interpretation of Theorem 4.7 over a meticulous examination of local steps. Therefore, in deriving Eq.25 and the subsequent equations, we opted for simplifications to enhance interpretability, albeit at the expense of sacrificing optimality. We view this as an interpretational artifact rather than a flaw in the proof.

We are thankful for the reviewer to point out this aspect and we will include more discussions in the revision.




\textbf{The paper considers only the strongly convex problems.}

We acknowledge that strong convexity is not general enough. However, as the first sampling method in federated learning, we believe the analysis with strong convexity assumptions is standard and important in the sampling community. Nevertheless, a follow-up of our work [2] has extended our results to slightly more general assumptions that satisfy the log Sobolev inequality, thereby potentially facilitating broader practical applications.

\textbf{In Assumption H.1, the paper assumes uniformly bounded sensitivity.}

We introduced the initial differential privacy guarantee for the federated sampling algorithms. While we acknowledge that the assumptions made are not optimal, we remain committed to refining them in future pursuits.

[1] Plassier, Durmus, Moulines. Federated Averaging Langevin Dynamics: Toward a unified theory and new algorithms. AISTAT'23.

[2] Federated Sampling with Langevin Algorithm under Isoperimetry. TMLR'24.

\textbf{Reviewer hZQQ}

Thanks for your valuable comments and suggestions.

\textbf{any connection between the optimal tuning of $\rho$
 and Theorem 4.9}

It is known that a larger variance of Gaussian noise leads to better privacy protection. Setting $\rho=0$ is computationally optimal but not optimal in differential privacy. The specific guarantees on differential privacy are further studied in Theorem 4.11, which studies a trade-off between privacy and utility as shown in $\epsilon_1$ in Theorem 4.11.

\bibliography{ICML}
\bibliographystyle{plain}

\end{document}


