Keywords: Mid Training, Data Selection, Data Stream
Abstract: Target-aware data selection aims to improve performance on a specific target objective by selecting training samples that are most beneficial to a small target set. Existing methods largely assume static access to the full candidate set, enabling global scoring and ranking. However, such offline access is rarely available in practice: candidate data typically arrives as a data stream over time, which precludes global re-scoring and highlights the need for \emph{online target-aware data selection}, in which selection and model updates are performed repeatedly on small incoming batches. This regime introduces two key challenges: (i) First, the target-side signal used to guide selection, typically instantiated via target gradients, drifts as the model evolves; reusing stale signals leads to misaligned selection decisions, while frequent recomputation from a small target set yields high-variance and destabilizing estimates; and (ii) the online training dynamics may become \emph{non-convergent}, resulting in late-stage performance degradation on the target even when the selection rule is repeatedly refreshed. To address these challenges, we analyze the dynamics of online target-aware selection and identify a bias–variance tradeoff in tracking drifting target signals, as well as structural sources of non-convergence. Guided by this analysis, we propose a lightweight stabilization framework that refreshes the target signal only when target performance plateaus and applies exponential moving average smoothing to control estimation variance. To further mitigate non-convergence and late-stage degradation, we introduce a refresh-aware early-stopping mechanism that halts training when repeated refreshes fail to yield measurable improvement in the target, indicating that the remaining stream has exhausted its marginal utility. Experiments show that our proposed framework consistently improves online selection across multiple tasks and model scales, while reliably preventing late-stage collapse and yielding stronger final target performance.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 16
Loading