Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Tianyi Zhang; Yu Cao; Dianbo Liu

Uncertainty-Based Extensible Codebook for Discrete Federated Learning in Heterogeneous Data Silos

Tianyi Zhang, Yu Cao, Dianbo Liu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Federated learning (FL), aimed at leveraging vast distributed datasets, confronts a crucial challenge: the heterogeneity of data across different silos. While previous studies have explored discrete representations to enhance model generalization across minor distributional shifts, these approaches often struggle to adapt to new data silos with significantly divergent distributions. In response, we have identified that models derived from FL exhibit markedly increased uncertainty when applied to data silos with unfamiliar distributions. Consequently, we propose an innovative yet straightforward iterative framework, termed \emph{Uncertainty-Based Extensible-Codebook Federated Learning (UEFL)}. This framework dynamically maps latent features to trainable discrete vectors, assesses the uncertainty, and specifically extends the discretization dictionary or codebook for silos exhibiting high uncertainty. Our approach aims to simultaneously enhance accuracy and reduce uncertainty by explicitly addressing the diversity of data distributions, all while maintaining minimal computational overhead in environments characterized by heterogeneous data silos. Extensive experiments across multiple datasets demonstrate that UEFL outperforms state-of-the-art methods, achieving significant improvements in accuracy (by 3\%--22.1\%) and uncertainty reduction (by 38.83\%--96.24\%). The source code is available at https://github.com/destiny301/uefl.

Lay Summary: Modern machine learning often relies on federated learning (FL) to train models using data from many different sources—like phones or hospitals—without sharing private information. But this approach struggles when each source (or "silo") has very different types of data. In these cases, combining updates from different silos leads to poor accuracy and inconsistent predictions. To fix this, we developed a new method called Uncertainty-Based Extensible-codebook Federated Learning (UEFL). UEFL converts features into simplified representations and dynamically adds new ones when the model is unsure. This allows the model to learn from each silo more effectively, even when the data is very different. Our experiments show that UEFL improves accuracy by up to 22% and reduces uncertainty by as much as 96%, all with little added computation. It also works well when tested on completely new types of data. This makes UEFL a practical solution for real-world FL systems—like mobile apps or healthcare platforms—where data differences are the norm.

Link To Code: https://github.com/destiny301/uefl

Primary Area: Social Aspects->Fairness

Keywords: federated learning, heterogeneous data, discretization

Submission Number: 8795

Loading