Reinforcement Learning Based Data Weight Optimization for Sequential Recommendation

Shiquan Wang, Yicheng Di, Jiayu Bao, Zhuolong Jiang, Hongjian Shi, Ruhui Ma, Xin Gao, Zhiwei Song, Hong Yuan, Yuan Liu, Haibing Guan

Published: 01 Jan 2026, Last Modified: 21 Jan 2026IEEE Transactions on Consumer ElectronicsEveryoneRevisionsCC BY-SA 4.0

Abstract: Recent advances in recommendation systems have highlighted the critical importance of data quality in model performance. In this paper, we propose a reinforcement learning based data weight optimization framework, termed RLWORec, to enhance data quality for both small recommendation models and large language model (LLM) fine-tuning scenarios. By dynamically assigning continuous importance weights to training samples via a policy gradient method under the Proximal Policy Optimization (PPO) framework, our approach effectively identifies and filters noisy data while preserving informative samples. Unlike traditional data selection methods that rely on static scoring mechanisms, RLWORec adaptively learns sample importance through iterative optimization with global performance feedback. Extensive experiments on three real-world datasets demonstrate that RLWORec consistently outperforms state-of-the-art data selection baselines, achieving superior recommendation performance with significantly reduced training data. Our method enables small models to exceed full-dataset performance using only carefully selected subsets, while allowing large models to achieve comparable results with merely 2% of the original training data.

External IDs:doi:10.1109/tce.2026.3655431