Keywords: State-based Fine-Tuning, Mixture-of-Control, Sparse Mixture of Experts
TL;DR: We propose Mixture-of-Control (MoC), a lightweight MoE-inspired routing framework that treats block-wise control states as experts to enable efficient cross-layer communication during fine-tuning with substantial memory savings.
Abstract: State-based fine-tuning has emerged as a compelling alternative to weight-based adaptation for transformers, updating lightweight controls into states rather than model weights, offering substantial memory savings while retaining parameter efficiency. However, most existing state-based methods typically apply only per-block control updates, which limits inter-block information exchange and restricts representational adaptation. Meanwhile, prior mechanisms that enable cross-block communication often introduce considerable computational overhead, reducing their practicality for efficient fine-tuning. We introduce Mixture-of-Control (MoC), a lightweight fine-tuning framework that adaptively integrates local and global control signals to enhance representation learning. MoC treats block-wise control states as experts in a sparse mixture-of-experts process, enabling efficient communication across transformer blocks. Empirical results across diverse transformer-based benchmarks demonstrate that MoC outperforms state‑based methods while maintaining a comparable memory and computational efficiency.
Submission Number: 83
Loading