Disentangling Federated Learning Heterogeneity: A Dual-Perspective Analysis of Quantifying Skew versus Scarcity

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Federated Learning (FL) faces significant challenges due to data heterogeneity, which manifests as label distribution skew and label missingness. We propose Skew-Scarcity Disentanglement Indicator (SSDI), a novel metric that decomposes heterogeneity into two orthogonal components: Label Distribution Skew (LDS) (quantity skew of present labels) and Label Coverage Deficiency (LCD) (deviation due to missing labels). Using a PAC-Bayesian framework, we derive a generalization bound showing that Label Coverage Deficiency becomes the dominant risk factor as the number of clients increases, severely degrading accuracy on rare labels. Our study reveals that, for a fixed number of labels, increasing clients is the primary driver of per-label accuracy variance by exacerbating Label Coverage Deficiency. Moreover, a higher global missing rate intensifies this divergence effect and precipitates severe performance breakdown at a lower critical threshold of clients. Experiments on vision benchmarks confirm that SSDI accurately captures the severity of performance divergence. The SSDI framework provides a principled tool for diagnosing heterogeneity and guiding targeted mitigation strategies.
Submission Number: 2296
Loading