FlexOLMo: Open Language Models for Flexible Data Use

Weijia Shi; Akshita Bhagia; Kevin Farhat; Niklas Muennighoff; Jacob Morrison; Evan Pete Walsh; Dustin Schwenk; Shayne Longpre; Jake Poznanski; Allyson Ettinger; Daogao Liu; Margaret Li; Mike Lewis; Wen-tau Yih; Dirk Groeneveld; Luca Soldaini; Kyle Lo; Noah A. Smith; Luke Zettlemoyer; Pang Wei Koh; Hannaneh Hajishirzi; Ali Farhadi; Sewon Min

FlexOLMo: Open Language Models for Flexible Data Use

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Language Model

Abstract: We introduce FlexOLMo, a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on private datasets, and (2) data-flexible inference, where these parameters along with their associated data can be easily included or excluded from model inferences with no further training. FlexOLMo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on private datasets and later integrated through a new nonparametric routing without any joint training across datasets. FlexOLMo is trained on FLEXMIX, a corpus we curate comprising seven restricted sets, either real or realistic approximations, alongside publicly available datasets. We evaluate models with up to 37 billion parameters (20 billion active) on 31 diverse downstream tasks. We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners significantly benefiting from these restricted sets (an average 41% relative improvement) while allowing flexible opt-out at inference time (e.g., for users without appropriate licenses or permissions). Our approach also outperforms prior model merging methods by 10.1% on average and surpasses the standard MoE trained without data restrictions using the same training FLOPs. Altogether, FlexOLMo enables training on restricted data while keeping data local and supports fine-grained control of data access at inference.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 18204

Loading