Disentangling Federated Learning Heterogeneity: A Dual-Perspective Analysis of Quantifying Skew Versus Scarcity

Wenkai Zeng; NAN YANG; Zhiyu Zhu; Zhibo Jin; Dong Yuan

Disentangling Federated Learning Heterogeneity: A Dual-Perspective Analysis of Quantifying Skew Versus Scarcity

Wenkai Zeng, NAN YANG, Zhiyu Zhu, Zhibo Jin, Dong Yuan

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Federated Learning faces significant challenges due to data heterogeneity, which manifests as Label Distribution Skew and label missingness. We propose Skew-Scarcity Disentanglement Indicator (SSDI), a novel metric that decomposes heterogeneity into two disentangled components: Label Distribution Skew (LDS) (quantity skew of present labels) and Label Coverage Deficiency (LCD) (deviation due to missing labels). Using a PAC-Bayesian framework, we derive a generalization bound indicating that Label Coverage Deficiency becomes the dominant risk factor as the number of clients increases, severely degrading accuracy on rare labels. Our study reveals that, for a fixed number of labels, increasing clients is a primary driver of per-label accuracy variance by exacerbating Label Coverage Deficiency. Moreover, a higher global missing rate intensifies this divergence effect and can precipitate severe performance breakdown at a lower critical threshold of clients. Experiments on vision benchmarks confirm that SSDI accurately captures the severity of performance divergence. The SSDI framework provides a principled tool for diagnosing heterogeneity and guiding targeted mitigation strategies. The code for the SSDI-controlled client-label matrix generation used in our experiments is available at https://github.com/wkzeng/SSDI.git.

Code Dataset Promise: Yes

Code Dataset Url: https://github.com/wkzeng/SSDI.git

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 2296

Loading