Particles Observed in Loss Variations with Model Pre-Scaling

Particles Observed in Loss Variations with Model Pre-Scaling

ACL ARR 2025 February Submission4098 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The latest advancements in foundational large language models (LLMs) have challenged the widely recognized scaling laws, primarily manifesting in the reinterpretation of the relationship between model scale, data scale, and model capabilities. This paper proposes a novel research perspective by treating the model's holistic weights as a system variable. Through preliminary subtle scaling of the model during supervised fine-tuning (SFT) — a method referred to as pre-scaling — we systematically investigate the relationship between performance evolution and model variations. Building on this approach, we conduct extensive experiments across various pre-trained language models (PLMs), revealing the discrete features of the model: loss particles and output particles. Through empirical investigation and theoretical analysis, we characterize the fundamental process and statistical properties of particle fission during SFT. According to the inherent properties of output particles, the coupling relationship between these particles and sample importance is established. Based on this insight, we propose a simple and efficient data selection method named Pre-Scaling Pruning (PSP), which comprises two strategies: $\mathrm{PSP_{one-shot}}$ and $\mathrm{PSP_{zero-shot}}$. Notably, at a pruning ratio of 50%, the data subset selected by $\mathrm{PSP_{one-shot}}$ achieves a higher average GLUE score than the full dataset, demonstrating that high-quality data subsets can not only reduce computational overhead but also enhance the model’s generalization capability.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: data-efficient training,generalization,data influence,scaling,fine-tuning

Languages Studied: English

Submission Number: 4098

Loading