The FLResilience Benchmark: A Systematic Framework for Evaluating Federated Learning Robustness to System-Level Failures

Laura Tran-Dubois

The FLResilience Benchmark: A Systematic Framework for Evaluating Federated Learning Robustness to System-Level Failures

Laura Tran-Dubois

Published: 27 Jan 2026, Last Modified: 07 Apr 2026FLCA PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, Robust Aggregation, Byzantine Resilience, System Failures, Benchmarking, Critical Applications, Distributed Machine Learning

TL;DR: FLResilience benchmark systematically evaluates federated learning robustness against system failures and attacks, revealing critical performance-security trade-offs for real-world deployment.

Abstract: Federated Learning (FL) deployment in critical applications is hindered by system-level failures and adversarial attacks. Current research lacks standardized evaluation of robust aggregation algorithms under realistic conditions. This paper introduces FLResilience, a comprehensive benchmark for systematically evaluating FL robustness against client dropout, stragglers, and Byzantine attacks. Through extensive experiments across diverse datasets and non-IID settings, we demonstrate that robust aggregators like Median and FoolsGold significantly outperform conventional methods, providing up to 45% higher robustness scores while revealing critical performance-security trade-offs essential for real-world FL deployment.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 20

Loading