\section{Problem Setting} \label{sec:prelim}
We consider the problem of classifying data of the form $\boldsymbol{\xi} = (\boldsymbol{x},y)$ distributed over $G$ clients, where $\boldsymbol{x} \in \mathcal{X} \subseteq \mathbb{R}^P$ is the feature vector and $y \in \{-1,+1\}$ is the label. With such data, a commonly-used transportation cost function is
\begin{equation}\label{eq:cost_func}
d(\boldsymbol{\xi},\boldsymbol{\xi}^{\prime}) \coloneqq ||\boldsymbol{x} - \boldsymbol{x}^{\prime}|| + \kappa \mathbbm{1}_{\{y \neq y^{\prime}\}},
\end{equation}
where $||\cdot||$ is any common norm on $\mathbb{R}^P$, and $\kappa$ is a hyperparameter corresponding to label flipping cost.

We consider classification via the binary SVM characterized by the hinge loss function $\ell_H(\boldsymbol{w};\boldsymbol{\xi})$, which is parameterized by $\boldsymbol{w} \in \mathbb{R}^P$ and defined as $\ell_H(\boldsymbol{w};\boldsymbol{\xi}) = \max\{0,1-y \cdot \boldsymbol{w}^{\mathsf{T}}\boldsymbol{x}\}$. We choose the SVM classifier as it is a well-established model that is commonly used in fault classification settings \citep{dutta2023,josey2018}. Moreover, its simple formulation allows for the rigorous derivation of a DR version.

We study the FL setting where clients can only communicate with the central server but not with each other. Clients do not share their data with the central server, but they can transmit insights from locally trained models, such as local (sub)gradients or model parameters. In this context, we assume the existence of a local training set $\mathcal{S}_g = \{\widehat{\boldsymbol{\xi}}_{n_g}\}_{n_g=1}^{N_g} = \{(\widehat{\boldsymbol{x}}_{n_g},\widehat{y}_{n_g})\}_{n_g=1}^{N_g}$ at each client $g$. We denote the empirical distribution of the $N_g$ IID local training samples and their true distribution as $\widehat{\mathbb{P}}_{N_g}$ and $\mathbb{P}_g$, respectively. Finally, we denote the total number of training samples available at all clients as $N = \sum_g^G N_g$.