\section{Background}
\label{sec:background}
\paragraph{Federated learning.} The objective of federated learning~\citep{mcmahan2017communication} is to train a machine learning model in a distributed fashion without centralized collection of training data. In detail, let $f_{\bw}$ be the \emph{global federated model} parameterized by $\bw$, and consider a supervised learning setting that optimizes $\bw$ by minimizing a loss function $\ell$ over the training set $\calD_{\rm train}$: $\sum_{(\bx, y)\in \calD_{\rm train}}\ell(f_{\bw}(\bx), y)$. In centralized learning this is typically done by computing a stochastic gradient $\frac{1}{B} \sum_{i=1}^B \nabla_{\bw}\ell(f_{\bw}(\bx_i), y_i)$ over a randomly drawn batch of data $(\bx_1,y_1),\ldots,(\bx_B,y_B)$ and minimizing $\ell$ using stochastic gradient descent.

In FL, instead of centrally collecting $\calD_{\rm train}$, the training set $\calD_{\rm train}$ is distributed across multiple clients and the model $f_{\bw}$ is stored on a central server. At each iteration, the model parameter $\bw$ is transmitted to each client to compute the per-sample gradients $\{ \nabla_\bw \ell(f_{\bw}(\bx_i), y_i) \}_{i=1}^B$ locally over a set of $B$ clients. The server and clients then execute a \emph{federated aggregation} protocol to compute the average gradient for the gradient descent update. A major advantage of FL is data privacy since clients do not need to disclose their data explicitly, but rather only send their gradient $\nabla_\bw \ell(f_{\bw}(\bx_i), y_i)$ to the server. Techniques such as secure aggregation~\citep{bonawitz2016practical} and differential privacy~\citep{dwork2006calibrating, dwork2014algorithmic} can further reduce the risk of privacy leakage from sending this gradient update.

\paragraph{Gradient inversion attack.} Despite the promise of data privacy in FL, recent work showed that the heuristic of sending gradient updates instead of training samples themselves in fact provides a false sense of security. \citet{zhu2019deep} showed in their seminal paper that it is possible for the server to recover the full batch of training samples given aggregated gradients. These \emph{optimization-based} gradient inversion attacks operate by optimizing a set of \emph{dummy data} $\tilde{\bx}_1,\ldots,\tilde{\bx}_B$ and labels $\tilde{y}_1,\ldots,\tilde{y}_B$ to match their gradients to the observed gradients with cost function:
\begin{equation}
    \label{eq:opt_objective}
    \min_{\tilde{\bx}, \tilde{\by}} \left\| \sum_{i=1}^B \nabla_\bw \ell(f_{\bw}(\tilde{\bx}_i), \tilde{y}_i) - \sum_{i=1}^B \nabla_\bw \ell(f_{\bw}(\bx_i), y_i) \right\|_2^2
\end{equation}

For image tasks, since \autoref{eq:opt_objective} is differentiable to $\tilde{\bx}_i$ and $\tilde{y}_i$, and the model parameters $\bw$ are accessible to the server, the server can simply optimize \autoref{eq:opt_objective} using gradient-based search. Doing so yields recovered samples $(\tilde{\bx}_i,\tilde{y}_i)$ that closely resemble actual samples $(\bx_i,y_i)$ in the batch. In practice this approach is highly effective, and follow-up works proposed several optimizations to further improve its recovery accuracy~\citep{geiping2020inverting, yin2021see, jeon2021gradient}.

For language tasks this optimization problem is considerably more complex since the samples $\bx_1,\ldots,\bx_B$ are sequences of discrete tokens, and optimizing \autoref{eq:opt_objective} amounts to solving a discrete optimization problem. To circumvent this difficulty, \citet{zhu2019deep} and \citet{deng2021tag} instead optimize the \emph{token embeddings} to match the observed gradient and then maps the recovered embeddings to their closest tokens in the embedding layer to recover the private text. In contrast, \citet{gupta2022recovering} leveraged the insight that the gradient of the token embedding layer can be used to recover exactly the set of tokens present in the training sample, and used beam search to optimize the ordering of tokens for fluency to recover the private text.

%\paragraph{Gradient inversion attack in CV.}
%In computer vision, both data $\bx$ and gradient $\ell(f_{\bw}(\bx), y)$ are high-dimensional vectors, which increases the difficulty of optimizing \autoref{eq:opt_objective}.
%To address this issue, IG~\citep{geiping2020inverting} changes the mean-square error in \autoref{eq:opt_objective} to cosine similarity, and adds total variation regulation to the optimization target as an image prior according to the assumption that local regions in images should be smooth.
% The optimization objective becomes:
% \begin{equation}
%     \label{eq:ig}
%     \min_{\tilde{\bx}} \cos(\nabla_\bw \ell(f_{\bw}(\tilde{\bx}), \tilde{y}) - \nabla_\bw \ell(f_{\bw}(\bx), y)) + \alpha_{TV}R_{TV}(\tilde{\bx}) .
% \end{equation}
%STG~\citep{yin2021see} studies label restoration in the batched training setting, and further improves image fidelity with two additional optimization targets: consistency in batch normalization layer statistics, and consistency in optimization results computed with different random seeds.
% :
% \begin{align}
%     \label{eq:stg}
%     \min_{\tilde{\bx}} \left\| \nabla_\bw \ell(f_{\bw}(\tilde{\bx}), \tilde{y}) - \nabla_\bw \ell(f_{\bw}(\bx), y)) \right\|_2 &+ \alpha_{TV}R_{TV}(\tilde{\bx}) + \alpha_{BN}R_{BN}(\tilde{\bx}) \nonumber \\
%     &+ \alpha_{group}R_{group}\|\tilde{\bx} - \mathbb{E}(\tilde{\bx})\|_2 .
% \end{align}
%GI-GIP~\citep{jeon2021gradient} extends STG with a generative model $G$ pretrained on a similar distribution of data. Instead of optimizing on data $\bx$ directly, GI-GIP optimizes a latent embedding $\tilde{\bz}$ and generates data with $\tilde{\bx} = G(\tilde{\bz})$ in order to utilize the image prior from the generative model.


%\paragraph{Gradient inversion attack in NLP.} 
%TAG~\citep{deng2021tag} focuses on gradient inversion for text transformer models.
%It optimizes the word embeddings to match the received gradients.
%Different weights are set for gradients at different layers when calculating the reconstruction loss between the true data gradient and the gradient from reconstructed data.
%FILM~\citep{gupta2022recovering} doesn't optimize the dummy data to match the received gradient. It first obtains the bag of words from the gradient of word embeddings and then utilize the language model to order the words to sentences.
% Mathematically, in this method the objective function to recover one sentence is:
% \begin{equation}
%     \label{eq:opt_objective_nlp}
%     \min_{\tilde{\bx}}  \left\| \nabla_\bw \ell(f_{\bw}(\tilde{\bx}), \tilde{y}) - \nabla_\bw \ell(f_{\bw}(\bx), y) \right\|_2^2 + \alpha(f_{\bw}) \left\| \nabla_\bw \ell(f_{\bw}(\tilde{\bx}), \tilde{y}) - \nabla_\bw \ell(f_{\bw}(\bx), y) \right\|_1 ,
% \end{equation}
% where $\alpha(f_{\bw})$ denotes the weights at different dimensions and has the same shape as $\bw$.
% \ruihan{Add FILM, etc. paper that rely on the BOW.}
% \chuan{Condense these two paragraphs.}

\paragraph{Gradient inversion under the malicious server setting.} The aforementioned gradient inversion attacks operate under the \emph{honest-but-curious} setting where the server faithfully executes the federated learning protocol, but attempts to extract private information from the observed gradients. \citet{fowl2021robbing}, \citet{boenisch2021curious} and \citet{fowl2022decepticons} consider a stronger \emph{malicious server} threat model, which allows the server to transmit arbitrary model parameters $\bw$ to the clients. Under this threat model, it is possible to carefully craft the model parameters such that the training sample can be recovered exactly from its gradient even when the batch size $B$ is large. While this setting is certainly realistic and relevant, our paper operates under the weaker honest-but-curious threat model.