Generalization in Federated Learning: A Conditional Mutual Information Framework

Ziqiao Wang; Cheng Long; Yongyi Mao

Generalization in Federated Learning: A Conditional Mutual Information Framework

Ziqiao Wang, Cheng Long, Yongyi Mao

Published: 01 May 2025, Last Modified: 15 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We provide conditional mutual information generalization bounds for federated learning

Abstract: Federated learning (FL) is a widely adopted privacy-preserving distributed learning framework, yet its generalization performance remains less explored compared to centralized learning. In FL, the generalization error consists of two components: the out-of-sample gap, which measures the gap between the empirical and true risk for participating clients, and the participation gap, which quantifies the risk difference between participating and non-participating clients. In this work, we apply an information-theoretic analysis via the conditional mutual information (CMI) framework to study FL's two-level generalization. Beyond the traditional supersample-based CMI framework, we introduce a superclient construction to accommodate the two-level generalization setting in FL. We derive multiple CMI-based bounds, including hypothesis-based CMI bounds, illustrating how privacy constraints in FL can imply generalization guarantees. Furthermore, we propose fast-rate evaluated CMI bounds that recover the best-known convergence rate for two-level FL generalization in the small empirical risk regime. For specific FL model aggregation strategies and structured loss functions, we refine our bounds to achieve improved convergence rates with respect to the number of participating clients. Empirical evaluations confirm that our evaluated CMI bounds are non-vacuous and accurately capture the generalization behavior of FL algorithms.

Lay Summary: This paper looks at how well federated learning (FL)—a way to train AI models without collecting users' data—can perform on new, unseen data. While FL helps protect privacy, it's less understood than traditional training methods when it comes to generalizing to new users. We break down the sources of error in FL and use tools from information theory to better understand and measure these errors. We also propose new ways to predict how well FL models are likely to perform in practice. Our results provide stronger guarantees about FL performance and are backed up by experiments showing that the predictions match real-world behavior.

Primary Area: Theory->Learning Theory

Keywords: Generalization, Federated Learning, Information-theoretic generalization bounds

Submission Number: 7012

Loading