Knowledge Distillation of Black-Box Large Language Models

Knowledge Distillation of Black-Box Large Language Models

ACL ARR 2024 June Submission5023 Authors

16 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: distillation

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 5023

Loading