Resource-Efficient and Model-Independent Data Selection Framework for Instruction Fine-Tuning

ACL ARR 2024 June Submission4980 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) possess powerful capabilities and play a crucial role in daily life. Instruction fine-tuning is essential for training LLMs, enabling them to understand human instructions and produce the desired output. Selecting appropriate data for instruction fine-tuning is essential but challenging, existing data selection methods struggle to balance effectiveness and efficiency in real-world scenarios. Given that instruction fine-tuning requires models to respond to a wide variety of questions, we focus on output quality to assess the quality of instruction fine-tuning samples. In this work, we propose a novel data selection framework that evaluates data from unknown sources based on its output. To guide the model in distinguishing instruction fine-tuning data, we train a discriminator that uses outputs from models of varying quality as supervision signals. We establish principles to evaluate model quality, asserting that a model's quality is higher if it is a newer version, has more parameters, and achieves higher scores on well-known benchmarks. This way, the discriminator learns the differences between outputs from different models, enabling it to categorize unknown data into the most similar model outputs. We conduct experiments to prove that our method is resource-efficient and model-independent.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: fine-tuning
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4980
Loading