Training Data Soft Selection via Joint Density Ratio Estimation

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper studies the training data selection problem, focusing on the selection of effective samples to improve model training using data affected by distributional shifts (i.e., data drifts). Existing drift-detection-based methods struggle with local drifts, while recent drift-localization-based methods lack theoretical support for the problem and are often ineffective. To tackle these issues, this paper proposes TSJD, a training data soft selection method based on joint density ratio estimation. TSJD assigns training weights (i.e., soft selects) to samples based on the estimated joint density ratio to align the selected data with the recent data distribution. By evaluating each sample independently of time, TSJD effectively addresses local data drifts. We also provide theoretical guarantees by deriving an upper bound on the generalization error for models trained with data selected by TSJD. In numerical experiments with four real-world datasets, TSJD shows great versatility, achieving the best or comparable results over baseline methods in all of the experiments.
Supplementary Material: pdf
Submission Number: 246
Loading