Keywords: Fine-tuning, low-rank approximation, tensor networks, LLMs
TL;DR: Global adapters for fine-tuning based on tensor train decompositions
Abstract: We present MetaTT, a Tensor Train (TT) adapter framework for fine-tuning of
pre-trained transformers. MetaTT enables flexible and parameter-efficient model
adaptation by using a single shared TT to factorize transformer sub-modules. This
factorization indexes key structural dimensions, including layer and matrix type,
and can optionally incorporate heads and tasks. This design allows MetaTT’s pa-
rameter count to scale with the sum, rather than the product, of the modes, resulting
in a substantially more compact adapter. Our benchmarks compare MetaTT with
LoRA along with recent state-of-the-art matrix and tensor decomposition based
fine-tuning methods. We observe that when tested on single-task standard language
modeling benchmarks, MetaTT achieves competitive parameter efficiency to accu-
racy tradeoff. We further demonstrate that MetaTT performs competitively when
compared to state-of-the-art methods on multi-task learning. Finally, we leverage
the TT-ansatz to design a rank-adaptive optimizer inspired by the DMRG method
from many-body physics. Our results demonstrate that integrating this approach
with AdamW enhances optimization performance for a specified target rank.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14012
Loading