MoSA: Mosaic Shared Adaptation of Large Language Models

Xiequn Wang; Zhan Zhuang; Shengda Luo; Yu Zhang

MoSA: Mosaic Shared Adaptation of Large Language Models

Xiequn Wang, Zhan Zhuang, Shengda Luo, Yu Zhang

Published: 26 Jan 2026, Last Modified: 10 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Parameter-efficient fine-tuning, large language model, low-rank adaptation

Abstract: We introduce MoSA, a new parameter-efficient fine-tuning (PEFT) method that replaces low-rank factorization with randomized, fine-grained sharing of weight updates. Each adapted weight matrix is constructed by broadcasting a small set of learned scalars over a fixed tessellation, a pre-defined group assignment of weight entries of the weight matrix, producing expressive changes under the same parameter budget as low-rank adaptation (LoRA). MoSA requires no architectural changes and can be merged into the base model for zero-overhead inference. Across diverse language understanding and generation tasks, MoSA matches or surpasses strong PEFT baselines under strictly matched budgets. Analyses and ablations indicate that non-local parameter sharing acts as an effective regularizer, and that grouping design and budget allocation govern the expressivity–efficiency trade-off. These results position MoSA as a simple, scalable alternative to LoRA. Our code is available at https://github.com/XiequnWang/MoSA-ICLR26.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 11283

Loading