Scalable Ensemble Federated Learning with Enhanced Open-Set Recognition

Mustafa Siddiqui; Muhammad Tahir

Scalable Ensemble Federated Learning with Enhanced Open-Set Recognition

Mustafa Siddiqui, Muhammad Tahir

Published: 04 Jun 2026, Last Modified: 04 Jun 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Consensus-driven parameter averaging constitutes the dominant paradigm in federated learning. Although many methods incorporate auxiliary mechanisms or refinements, repeated round averaging remains their fundamental backbone. This paradigm inherently depends on repeated rounds of client–server communication to maintain consensus. The reliance on repeated communication is further amplified in regimes with high data heterogeneity and large client populations, as shown across numerous studies. This behavior arises from optimization drift in out-of-distribution settings, where client objectives differ and multi-step local SGD updates increasingly diverge, making consensus difficult to maintain. We argue that an emerging alternative, ensemble with abstention, provides a more suitable framework for addressing these issues. Rather than enforcing consensus across diverging client objectives, this approach constructs a specialized mixture-of-experts model by preserving client-specific models and selectively aggregating their predictions. As a one-shot FL method, it eliminates the need for repeated communication rounds altogether. Moreover, supported by both theoretical and empirical analysis, we show that this paradigm sidesteps cross-client drift and is inherently less sensitive to data heterogeneity. Despite these advantages, ensemble with abstention introduces two fundamental challenges. First, its performance depends on the design of the open-set recognition (OSR) task, which directly affects performance under heterogeneity. Second, and more critically, preserving client-specific models causes linear growth in model size with the number of clients, limiting scalability. As a step toward addressing these limitations, we introduce FedSOV, which incorporates improved negative sample generation to prevent shortcut cues in the OSR task and employs pruning to address the scalability problem. We show that pruning provides a practical and effective solution to the scalability problem while simultaneously enhancing generalization, yielding higher test accuracy. Across datasets, our method demonstrates clear improvements in highly heterogeneous regimes compared to both the ensemble baseline FedOV and the strongest parameter-averaging method, FedGF. Code is available at: https://github.com/Mustafa00124/FedSOV

Submission Type: Long submission (more than 12 pages of main content)

Code: https://github.com/Mustafa00124/FedSOV

Assigned Action Editor: ~Eugene_Belilovsky1

Submission Number: 7588

Loading