Keywords: Recommender Systems, Reinforcement Learning, Information Retrieval
TL;DR: We propose ADQO for recommender systems, balancing exploration-exploitation via fine-grained user modeling to boost low-activity user conversion and high-activity retention.
Abstract: Recommender systems, as core components of modern digital platforms, leverage reinforcement learning (RL) paradigms to optimize long-term user experience through exploration-exploitation trade-offs. While existing studies implement differentiated policies via coarse-grained user grouping, they face critical challenges in dynamically evolving scenarios: how to capture fine-grained user state transitions and establish precise exploration-exploitation balancing mechanisms. Empirical analysis on existing dataset shows that over
40% of users experience activity level transitions within four weeks, highlighting the need for dynamic optimization. To address this, we propose Activity-Driven Quantile Optimization (ADQO), which integrates a general value critic network for user activity modeling and a quantile critic network to finely characterize the distribution of recommendation values, capturing the stochastic of user feedback. A dynamic policy implements high-potential exploration for low-activity users and low-risk exploitation for high-activity users: optimizing the upper quantiles for low-activity users to uncover latent interests, and the lower quantiles for high-activity users to mitigate risks. We further introduce two alignment losses to enhance training stability and consistency. Experiments demonstrate ADQO's superior performance across three datasets, effectively converting low-activity users to higher states and retaining high-activity users, validating its practical applicability. Our data analysis and training code are shared at https://anonymous.4open.science/r/ADQO-6DC9/.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 4816
Loading