On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Dongyang Fan; Bettina Messmer; Nikita Doikov; Martin Jaggi

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ($\textbf{Co}$llaborative learning with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across users while localizing a varying number of specialist experts, thereby adapting to users’ computational resources and preserving privacy. Through extensive experiments, we show CoMiGS effectively balances general and personalized knowledge for each token generation. We demonstrate that CoMiGS remains robust against overfitting—due to the generalists' regularizing effect—while adapting to local data through specialist expertise. We open source our codebase for collaborative LLMs.

Lay Summary: Modern language models are increasingly deployed on personal devices to preserve user privacy and improve personalization. However, this approach faces two major challenges: devices differ in computational power (model heterogeneity), and users have unique data and language habits (data heterogeneity). Traditional methods cannot effectively address both at once. We introduce CoMiGS, a collaborative learning framework that blends shared "generalist" knowledge with user-specific "specialist" insights. It dynamically routes each word prediction to the most suitable expert using a novel bi-level optimization algorithm that separates training and validation phases. CoMiGS enables efficient, privacy-preserving language model customization on devices with varying capabilities. It reduces communication costs by 50%, minimizes risk of overfitting, and delivers consistent performance across users. This makes it a practical foundation for smarter, more adaptive AI on mobile and edge devices.

Primary Area: Deep Learning->Algorithms

Keywords: Federated Learning, Collaborative Learning, On-device LLMs, Mixture of Experts, Alternating Minimization

Submission Number: 7185

Loading