Complexity-aware fine-tuning

ACL ARR 2025 May Submission6687 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: General purpose Large Language Models (LLMs) are frequently fine-tuned to improve performance in niche domains. Although fine-tuning is a standard practice, we still lack a deep understanding of how to aggregate data for better results. In this work, we show that the entropy-based output estimation provides a meaningful guideline for fine-tuning data preparation. Specifically, across two small open models ($\approx 3B$) we find that a single token answer entropy shows ROC AUC score of $\approx 0.73$ and allows us to split the training data into three complexity categories to apply different tuning mechanisms. As result, we propose a novel blueprint for efficient fine-tuning that outperforms the standard approach (0.5/0.6 vs. 0.4/0.46 accuracies). We also provide an in-depth analysis of alternative complexity estimation techniques based on expert assessment via model-as-judge (MASJ) and chain-of-thought entropy aggregation with ROC AUC scores of 0.57 and 0.7 accordingly. Our findings show immediate enhancements in fine-tuning performance. We publish our coda and data to facilitate further investigation and immersion of the numerical complexity analysis.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: data-efficient training,LLM Efficiency,calibration/uncertainty,fine-tuning,transfer learning / domain adaptation,NLP datasets,metrics
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 6687
Loading