Knowledge Distillation for Closed-Source Language Models

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Knowledge Distillation, Language Model
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Closed-source language models such as GPT-4 have achieved remarkable performance. Recently, many studies have focused on enhancing the capabilities of smaller models, through knowledge distillation (KD) on those closed-source language models. However, due to the inability to directly access the closed-source language model's output distribution, KD methods can currently only be performed using one-hot labels, which hinders the effectiveness of KD. To address this limitation, we propose a Bayesian estimation-based knowledge distillation method. Specifically, our method comprises prior estimation and posterior estimation. The prior estimation obtains a prior distribution by leveraging the corpus generated by the closed-source language model. The posterior estimation updates the prior distribution to obtain a posterior distribution, based on continued sampling results. Then we utilize the prior and posterior distributions for distillation. Experimental results showcase that, in the context of KD for closed-source language model, our method outperforms the current KD methods that directly fine-tune on the one-hot labels.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2382
Loading