Keywords: Efficient SFT, Data Reuse, Text Mixup
Abstract: Supervised Fine-Tuning (SFT) adapts pre-trained Large Language Models (LLMs) to domain-specific instructions by training on a carefully curated subset of high-quality instruction–response pairs, typically drawn from a larger dataset that often contains many low-quality or noisy samples. Despite its effectiveness, this quality-first paradigms often suffer from two caveats. On the one hand, quality filters are inherently imperfect, many samples that pass through these filters are not truly high-quality. On the other hand, discarding the vast majority of low-quality or frequently occurring examples may lose potentially valuable signal. As much of the readily available instruction-following data online has already been utilized, further improvements now depend on leveraging, rather than discarding, the examples that were previously filtered out. To address these two issues, we introduce ENTP, which stands for Enhancing low-quality SFT data via Neural-symbolic Text Purge-Mix. Similar to the ENTP personality type from MBTI, ENTP is creative in enhancing the low-quality data via purging (noisy information removal) and mixing (with extracted information from all available data and model knowledge). Specifically, the symbolic component identifies and isolates low-quality raw corpora using statistical priors, while the connectionist component extracts latent representations to guide the reconstruction of missing or corrupted information. This synergy generates hybrid instruction-response pairs that augment informational value while preserving corpus diversity. Our experiments demonstrate that fine-tuning LLMs on data augmented by ENTP, which are derived solely from low-quality sets, consistently outperforms 13 established data-selection methods across 5 standard instruction-following benchmarks. Notably, it can even surpass fine-tuning on the full original dataset (≈300K examples). Our findings demonstrate that ostensibly low-quality data is a critical resource; leveraging it through intelligent purification and synthesis is key to efficient and effective instruction alignment.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9604
Loading