Mixture of Latent Experts Using Tensor Products

Zhan Su; Fengran Mo; Prayag Tiwari; Benyou Wang; Qiuchi Li; Jian-Yun Nie; Jakob Grue Simonsen

Mixture of Latent Experts Using Tensor Products

Zhan Su, Fengran Mo, Prayag Tiwari, Benyou Wang, Qiuchi Li, Jian-Yun Nie, Jakob Grue Simonsen

Published: 23 Oct 2024, Last Modified: 23 Oct 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: In multi-task learning, the conventional approach involves training a model on multiple tasks simultaneously. However, the training signals from different tasks can interfere with one another, potentially leading to \textit{negative transfer}. To mitigate this, we propose a novel \textit{latent-expert} approach (\texttt{TensorPoly}), that balances parameter efficiency with nuanced routing methods. For \textit{experts}, we reparameterize Low-Rank Adaptation (\texttt{LoRA}) by employing an entangled tensor through the use of tensor product operations and name the resulting approach \texttt{TLoRA}. For \textit{routing function}, we tailor two innovative routing functions according to the granularity: \texttt{TensorPoly-I} which directs to each rank within the entangled tensor while \texttt{TensorPoly-II} offers a finer-grained routing approach targeting each order of the entangled tensor. The experimental results from the multi-task T0-benchmark demonstrate that: 1) all latent-expert approaches surpass the corresponding dense approaches, highlighting the potential of modular language models to mitigate negative inference in multi-task learning and deliver superior outcomes. 2) \texttt{TensorPoly-I} achieves higher parameter efficiency in adaptation and outperforms other modular LMs, which shows the potential of our approach in multi-task transfer learning \footnote{The code is released: \url{https://github.com/microsoft/mttl}}.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We updated the analysis of the training time and parameter size in the finetuning process.

Code: https://github.com/microsoft/mttl

Assigned Action Editor: ~Ran_He1

Submission Number: 2947

Loading