Keywords: Parameter-efficient fine-tuning, Large Language Model, Adaptation, Transformer
Abstract: We introduce MoSA, a new parameter-efficient fine-tuning (PEFT) method that replaces low-rank factorization with randomized, fine-grained sharing of weight updates. Each adapted weight matrix is constructed by broadcasting a small set of learned scalars over a fixed tessellation, a pre-defined group assignment of weight entries of the weight matrix, producing expressive changes under the same parameter budget as low-rank adaptation (LoRA). MoSA requires no architectural changes and can be merged into the base model for zero-overhead inference. Across diverse language understanding and generation tasks, MoSA matches or surpasses strong PEFT baselines under strictly matched budgets. Analyses and ablations indicate that non-local parameter sharing acts as an effective regularizer, and that grouping design and budget allocation govern the expressivity–efficiency trade-off. These results position MoSA as a simple, scalable alternative to LoRA.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 11283
Loading