Keywords: Parameter-efficient-fine-tuning, Representation tuning, LLM efficiency
TL;DR: We introduce a new efficient and flexible fine-tuning scheme by enlarging the latent embedding’s dimension
Abstract: The widespread adoption of large pretrained models has made fine-tuning an essential step for tailoring models to specific tasks. As these models continue to scale larger and as the demand for task-specific and personalized adaptation grows, parameter-efficient fine-tuning (PEFT) has emerged as a practical alternative to full fine-tuning. PEFT enables effective adaptation while updating only a small fraction of the total parameters. While various PEFT techniques have shown strong performance, many still suffer from increased inference latency and inefficiencies in multi-adapter scenarios. Motivated by these limitations, we propose a novel PEFT approach that leverages auxiliary representations to enable fast and flexible inference. In our method, Latent Task Embedding fine-tuning, a small task-specific latent embedding is concatenated to the original embedding. The corresponding weight matrices are extended, and only the additional parameters introduced by this expansion are trained. This design allows for efficient inference using a single matrix multiplication per weight, minimizing latency overhead, and supports task-specific masking to handle multiple adapters within a single model. We evaluate our method on large language models and latent diffusion models, demonstrating competitive accuracy with existing PEFT baselines while providing faster inference and enabling efficient intra-batch multi-task processing.
Primary Area: generative models
Submission Number: 8469
Loading