Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models

Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models

07 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: State-based Fine-Tuning, Mixture-of-Control, Sparse Mixture of Experts

TL;DR: We propose Mixture-of-Control (MoC), a lightweight MoE-inspired routing framework that treats block-wise control states as experts to enable efficient cross-layer communication during fine-tuning with substantial memory savings.

Abstract: State-based fine-tuning has emerged as a compelling alternative to weight-based adaptation for transformers, updating lightweight controls into states rather than model weights, offering substantial memory savings while retaining parameter efficiency. However, most existing state-based methods typically apply only per-block control updates, which limits inter-block information exchange and restricts representational adaptation. Meanwhile, prior mechanisms that enable cross-block communication often introduce considerable computational overhead, reducing their practicality for efficient fine-tuning. We introduce Mixture-of-Control (MoC), a lightweight fine-tuning framework that adaptively integrates local and global control signals to enhance representation learning. MoC treats block-wise control states as experts in a sparse mixture-of-experts process, enabling efficient communication across transformer blocks. Empirical results across diverse transformer-based benchmarks demonstrate that MoC outperforms state‑based methods while maintaining a comparable memory and computational efficiency.

Submission Number: 83

Loading