Exploiting Hardness and Diversity for Data-Efficient Fine-Tuning

ACL ARR 2026 January Submission5022 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Data-efficient fine-tuning, Mathematical reasoning, Data selection, Semantic diversity, Large language models
Abstract: Fine-tuning large language models for mathematical reasoning is often performed using large training sets, even though many examples become redundant once a model is already instruction-tuned. Under practical compute and time constraints, it is therefore important to understand which training examples actually matter. We investigate this problem on GSM8K by fine-tuning Gemma-2-2B-it with LoRA under a fixed data budget. We compare uniform random sampling with two structured data selection methods. A taxonomy-based approach, Skill-Balanced Sampling (SBS), enforces balanced coverage across predefined skill categories but yields only modest and inconsistent gains. We then propose Hardness-Weighted Diversity (HWD), which explicitly controls the proportion of easy, medium, and hard examples while promoting semantic diversity. Our empirical results show clear performance saturation well before the full dataset is utilized. Moreover, HWD achieves the best performance using only 9% of the GSM8K training data, outperforming both random sampling and SBS with substantially fewer training examples.
Paper Type: Long
Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Research Area Keywords: Reasoning, Data Selection, Efficient Training of Language Models
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Data analysis
Languages Studied: English
Submission Number: 5022
Loading