\subsection{Scalability Experiment}\label{sec:scal}
The purpose of this experiment is to examine the scalability of our proposed algorithms as the total number of samples $N$, the number of clients $G$, and the number of features $P$ grow. We measure performance in runtime required to achieve peak mCCR. We do this since the subgradient method lacks a practically implementable stopping criterion \citep{Bagirov2014}, and similarly, no stopping criterion is provided for multi-block ADMM by \cite{lin2015}. Moreover, it was already established in the experiment in Section \ref{sec:sens} that the SM algorithm requires more rounds of communication to attain peak performance. This experiment is more focused on computational effort required to achieve this performance. We examine the the following settings: 
\begin{enumerate}
    \item \textit{Increasing clients [fixed training samples]}: $N = 1000$, $P = 4$, $G \in \{10,20,30,40,50\}$.
    \item \textit{Increasing clients [increasing training samples]}: $N = 100G$, $P = 4$, $G \in \{10,20,30,40,50\}$.
    \item \textit{Increasing training samples}: $G = 10$, $P = 4$, $N \in \{1000,1500,2000,2500,3000\}$.
    \item \textit{Increasing features}: $N = 4$, $G = 10$, $P \in \{4,6,8,10,12\}$.
\end{enumerate}

\textbf{Dataset.} This experiment uses simulation data that is generated using the \texttt{make\_classification} module of the Scikit-Learn Python package \citep{scikit-learn}. The data generated belongs to two classes, each of which contains data sampled from a standard Gaussian distribution with means located at vertices of a $P$-dimensional hypercube with sides of length $2.4$ centered at the origin. The data is distributed equally across all clients and both classes, and no labels are altered.

\textbf{Baseline.} We utilize the centralized DR-SVM by \cite{2019regularization} as a baseline in this experiment.

\textbf{Hyperparameters.} For the SM algorithm, we test performance for $T \in \{ 140,180,220 \}$ and $\gamma \in \{ 1e1,1e2,1e3 \}$. For the ADMM and ADMM-SC algorithms, we test performance for $T \in \{ 10,20,30 \}$ and $\rho \in \{ 1e-3,1e-2,1e-1\}$. Across all algorithms, we fix $\varepsilon_g = \frac{1}{10N_g}$ and $\kappa = 0.25$. The central model's hyperparameters are varied in the same way as in the Sensitivity Analysis portoin of the experiment in Section \ref{sec:sens}, and the runtime that is reported reflects the time taken to solve the optimization problem. 

\textbf{Results.} The results of this study are reported in Figure \ref{fig:scal}. We observe a rough trend of increasing runtime as $N$ and $P$ increase for all models due to the increasing complexity of the local client problems. However, the trend is clearer with the SM algorithm, whereas it is noisy with all versions of the ADMM algorithm, and is hardly observable with the central model. This could be attributed to the fact that the SM algorithm requires a much longer time to reach peak mCCR, making the effect of random computer system variations minimal on the reported time. On the contrary, all versions of the ADMM and the central model reach peak mCCR in a very short time, making the reported time highly susceptible to system variations. These results highlight the fact that any performance gains achieved by using SM come at the cost of a much longer runtime. However, the runtime of ADMM and ADMM-SC is much closer to that of the central approach. Additionally, we observe that the runtime remains roughly constant for all federated algorithms as $G$ increases if $N$ is fixed. This is because a fixed $N$ makes the local problem at each client increasingly simpler and faster to solve as $G$ increases. In contrast, when both $G$ and $N$ are increasing we observe that all algorithms exhibit a trend of increasing runtime with the number of clients. 

\begin{figure*}[ht]
    \centering
    \includegraphics[width=1\textwidth]{Figures/scalability.pdf}
    \caption{Plots of Runtime to Reach Peak mCCR vs. the Number of Clients $G$ with Fixed and Increasing $N$, the Number of Features $P$, and the Number of Training Samples $N$ for All Methods Tested.}
    \label{fig:scal}
\end{figure*}