On Efficient Distillation from LLMs to SLMs

Metod Jazbec; Menglin Xia; Ankur Mallick; Daniel Madrigal; Dongge Han; Samuel Kessler; Victor Rühle

On Efficient Distillation from LLMs to SLMs

Metod Jazbec, Menglin Xia, Ankur Mallick, Daniel Madrigal, Dongge Han, Samuel Kessler, Victor Rühle

Published: 10 Oct 2024, Last Modified: 27 Oct 2024FITML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: finetuning, data efficiency, synthetic data, distillation, LLMs, SLMs

TL;DR: We study how to make finetuning with distilled synthetic data more efficient by incorporating SLM's feedback and LLM's rationales.

Abstract: Finetuning small language models (SLMs) on data generated by large language models (LLMs), a form of knowledge distillation, has recently been demonstrated to lead to significantly enhanced capabilities of small models across various domains (e.g., mathematical reasoning). However, current approaches typically require synthesizing a large number of new examples ($>100\textrm{K}$), which increases the resources and training time needed for finetuning. To address this issue, we investigate principles for making the distillation process more efficient by reducing the amount of synthetic data required. Specifically, we explore (i) incorporating SLM's feedback into the LLM's data generation process and (ii) including LLM's rationales (i.e., step-by-step solutions) in the distilled data. In our experiments using the Mistral7B model as the SLM on math reasoning tasks (GSM8K, MATH), we find that both feedback and rationales can help make finetuning with distillation more efficient (by requiring up to $\sim2\text{x}$ less synthetic data).

Submission Number: 11

Loading