Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: In a robust federated model evaluation scenario (A/B Testing), we propose an efficient and private aggragation method that combines local adversarial risks, and prove it achieves minimax-optimal guarantees against global (meta-distributional) shifts.
Abstract: We address the challenge of certifying the performance of a federated learning model on an unseen target network using only measurements from the source network that trained the model. Specifically, consider a source network "A" with $K$ clients, each holding private, non-IID datasets drawn from heterogeneous distributions, modeled as samples from a broader meta-distribution $\mu$. Our goal is to provide certified guarantees for the model’s performance on a different, unseen network "B", governed by an unknown meta-distribution $\mu'$, assuming the deviation between $\mu$ and $\mu'$ is bounded—either in Wasserstein distance or an $f$-divergence. We derive worst-case uniform guarantees for both the model’s average loss and its risk CDF, the latter corresponding to a novel, adversarially robust version of the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality. In addition, we show how the vanilla DKW bound enables principled certification of the model's true performance on unseen clients within the same (source) network. Our bounds are efficiently computable, asymptotically minimax optimal, and preserve clients' privacy. We also establish non-asymptotic generalization bounds that converge to zero as $K$ grows and the minimum per-client sample size exceeds $\mathcal{O}(\log K)$. Empirical evaluations confirm the practical utility of our bounds across real-world tasks. The project code is available at: github.com/samin-mehdizadeh/Robust-Evaluation-DKW
Lay Summary: We propose a privacy-preserving and polynomial-time procedure for evaluating the performance of a given machine learning model over a federated network of $K$ clients. Our goal, however, is to provide robust (worst-case) performance guarantees—i.e., bounds that remain valid when the model is deployed on a closely distributed but unseen network. This scenario frequently arises in pilot deployments, where the pilot is conducted in a slightly different region, city, or community. We demonstrate that both the model’s average loss and its risk CDF can be uniformly and robustly bounded, provided the aforementioned privacy and efficiency constraints are met. To achieve this, we reformulate classical results in statistics—namely, the Glivenko–Cantelli theorem and the Dvoretzky–Kiefer–Wolfowitz (DKW) inequality—into novel, adversarially robust versions that account for distributional shifts. These robust formulations underpin our theoretical guarantees and allow us to quantify uncertainty under worst-case deviations. Finally, we validate our bounds through extensive numerical experiments on real-world datasets, demonstrating both their practical accuracy and their resilience in heterogeneous and privacy-sensitive settings.
Link To Code: github.com/samin-mehdizadeh/Robust-Evaluation-DKW
Primary Area: Theory->Learning Theory
Keywords: Distributionally Robust Optimization, Generalization Bound, Glivenko-Cantelli Theorem and DKW Inequality, $f$-divergence and Wasserstein Adversaries
Submission Number: 3310
Loading