Keywords: distillation, data-efficient training, NLP in resource-constrained settings
TL;DR: Efficient knowledge distillation prompt to generate training set for training context compressor.
Abstract: This paper focuses on efficient Large Language Model data compression. Considering the linear context growth of self-evaluating and divide-and-conquer LLM modeling methods, techniques are needed to manage the size of shared context. Existing approaches compress data through prompt tuning, using detailed instructions to guide the output. However, this method may be suboptimal: (i) defining principles may restrict an LLM's inherent ability to compress data; (ii) longer prompts increase the overhead needed to process data.
To address these issues, we built upon the framework proposed by LLMLingua2, which formulates data compression as a token classification problem, and trains knowledge distilled models on data generated using compression prompts. We observed their model's output, designed new prompts targeted at areas of improvement, and evaluated on downstream tasks, such as summarization, question answering and mathematics. We then test our best prompting method on the summarization task of MeetingBank, 3% the size of LLMLingua2's prompt, while achieving a 61% size reduction of distilled data and higher model evaluation result than LLMLingua2's prompting method on all eight different metrics, at a low resource level of 1000 training pairs.
Archival Status: Non‑archival
Paper Length: Short Paper (up to 4 pages of content)
Submission Number: 352
Loading