ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

ACL ARR 2024 June Submission5293 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing \texttt{ObfuscaTune}, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only $~5\%$ of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of \texttt{ObfuscaTune} by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a naive version of our approach to highlight the necessity of using random matrices with low condition numbers in our approach to reduce errors induced by the obfuscation.

Paper Type: Short

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Privacy, Large language models, Confidentiality, model stealing

Languages Studied: English

Submission Number: 5293

Loading