On the Implicit Relation between Low-Rank Adaptation and Differential Privacy

Published: 11 Oct 2024, Last Modified: 10 Nov 2024M3L PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Task adaptation, differential privacy, large language models
TL;DR: We draw an implicit connection between low-rank task adaptation of language models and differential privacy.
Abstract: A significant approach in natural language processing involves large-scale pre-training on general domain data followed by adaptation to specific tasks or domains. As models grow in size, full fine-tuning all parameters becomes increasingly impractical. To address this, some methods for low-rank task adaptation of language models have been proposed, e.g. LoRA and FLoRA. These methods keep the pre-trained model weights fixed and incorporate trainable low-rank decomposition matrices into some layers of the transformer architecture, called adapters. This approach significantly reduces the number of trainable parameters required for downstream tasks compared to full fine-tuning all parameters. In this work, we look at low-rank adaptation from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA and FLoRA is equivalent to injecting some random noise into the batch gradients w.r.t the adapter parameters coming from their full fine-tuning, and we quantify the variance of the injected noise. By establishing a Berry-Esseen type bound on the total variation distance between the noise distribution and a Gaussian distribution with the same variance, we show that the dynamics of LoRA and FLoRA are very close to differentially private full fine-tuning the adapters, which suggests that low-rank adaptation implicitly provides privacy w.r.t the fine-tuning data. Finally, using Johnson-Lindenstrauss lemma, we show that when augmented with gradient clipping, low-rank adaptation is almost equivalent to differentially private full fine-tuning adapters with a fixed noise scale.
Is Neurips Submission: No
Submission Number: 30
Loading