United Yet Distinct: Domain Preservation via Divergence Reduction

TMLR Paper6248 Authors

18 Oct 2025 (modified: 30 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although there is a vast amount of data available for training Large Language Models (LLMs), data privacy concerns can limit centralized data aggregation, therefore limiting the learning capacity of LLMs on data from distributed sources. Federated Learning (FL) has emerged as a dominant framework for distributed training. The objective of FL is to preserve privacy while improving the performance of participating clients. However, the non-IID nature of participating clients can degrade model performance. Parameter Efficient Fine-Tuning (PEFT) enables adapting LLMs to downstream tasks with minimal parameter additions and updates to their existing parameters. Preserving performance while learning from data in a distributed setting warrants the need for efficient training frameworks that can enable LLMs to learn from disparate data. In this paper, we design and propose a novel FL aggregation algorithm, Divergence Reduction in Federated Training (DRIFT), which accounts for the divergence between clients during model aggregation and disseminates custom aggregated parameters back to each client. DRIFT measures the degree to which the PEFT parameters of the participating clients diverge and takes advantage of the graph-based structure implied by this divergence. We design two variants of DRIFT and, through extensive experimentation, show how DRIFT outperforms well-established baselines. Our training data and code are available at: https://anonymous.4open.science/r/drift-240F.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: This revision uses the default font for TMLR.
Assigned Action Editor: ~Ilan_Shomorony1
Submission Number: 6248
Loading