Keywords: proxy tuning, transfer learning, efficient fine-tuning
TL;DR: We propose using intermediate-sized models to train small proxy experts in a way which is aware of future ensembling with a larger general LLM.
Abstract: While large language models (LLMs) have been shown to benefit from fine-tuning on downstream data, on-device settings face compute and data privacy constraints which make directly finetuning these models infeasible. In settings where a large black-box (target) model cannot run on-device but can be remotely queried at deployment time, proxy tuning (PT) is a natural solution that fine-tunes a small white-box (proxy) model and combines its predictions with those of the target. However, when the pretrained knowledge of the two models differs significantly, PT can surprisingly perform worse than using either the target or proxy alone. To improve transferability of the proxy model without assuming training-time access to the target model, we propose IPT (Intermediate Proxy Tuning), which guides tuning of the proxy with an intermediate model. On three NLP tasks of language modeling, topic classification, and question answering, IPT improves post-transfer accuracy by up to 5\% and lowers PPL by up to 6 points over naive PT.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 35
Loading