Keywords: Federated Learning, One-Shot Federated Learning, Ensemble Method, Data Heterogeneity, Extreme Label Skew, Pruning
Abstract: Federated learning (FL) has gained widespread adoption as a privacy-preserving framework for distributed model training. However, it continues to face persistent challenges, most notably statistical heterogeneity and high communication cost. The current dominant paradigm in FL is consensus-driven averaging of model parameters across clients. Most recent methods, despite their innovations, remain anchored in repeated round averaging as the backbone of their design. The substantial communication overhead from repeated rounds is an obvious drawback, but another matter of debate is whether this approach can succeed under heterogeneous data, which forms the central focus of this paper. We argue that this prevailing approach fails to address heterogeneity. Using extreme label skew as a lens to expose its limitations, we demonstrate that even the most recent methods that ultimately rely on parameter averaging remain fundamentally limited in such settings. We instead advocate for an emerging alternative: ensemble-based FL with open-set recognition (OSR), which, by preserving client-specific models and selectively leveraging their strengths, directly mitigates the information loss and distortion caused by parameter averaging in heterogeneous settings. We consider this approach a principled path forward for addressing heterogeneity, substantiating our view through both theoretical analysis and extensive experiments. However, we acknowledge its primary limitation: the linear growth of ensemble size with client count, which hinders scalability. As a step forward in this direction, we introduce FedEOV, which incorporates improved negative sample generation to prevent shortcut cues, and FedEOV-pruned, which explores pruning as a solution to the scalability problem, rather than relying on distillation, thus avoiding the need for server-side data or additional training at the server. Our experiments across multiple datasets and heterogeneity settings confirm the superiority of our method, achieving an average improvement of 16.76% over the state-of-the-art ensemble baseline, FedOV, under extreme label skew and up to 102% over FedGF, the top-performing parameter averaging method. Furthermore, we show that pruned federated ensembles achieve performance on par with distilled ensembles, without any server-side data or training requirements, even when the latter is distilled using data from the same datasets. Code is available at: https://github.com/Anonymous6868-hue/FedEOV
Supplementary Material: pdf
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 23124
Loading