Activity-Driven Quantile Optimization: Dynamic Exploration and Exploitation in Recommender Systems

Changshuo Zhang

Activity-Driven Quantile Optimization: Dynamic Exploration and Exploitation in Recommender Systems

Changshuo Zhang

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Recommender Systems, Reinforcement Learning, Information Retrieval

TL;DR: We propose ADQO for recommender systems, balancing exploration-exploitation via fine-grained user modeling to boost low-activity user conversion and high-activity retention.

Abstract: Recommender systems, as core components of modern digital platforms, leverage reinforcement learning (RL) paradigms to optimize long-term user experience through exploration-exploitation trade-offs. While existing studies implement differentiated policies via coarse-grained user grouping, they face critical challenges in dynamically evolving scenarios: how to capture fine-grained user state transitions and establish precise exploration-exploitation balancing mechanisms. Empirical analysis on existing dataset shows that over 40% of users experience activity level transitions within four weeks, highlighting the need for dynamic optimization. To address this, we propose Activity-Driven Quantile Optimization (ADQO), which integrates a general value critic network for user activity modeling and a quantile critic network to finely characterize the distribution of recommendation values, capturing the stochastic of user feedback. A dynamic policy implements high-potential exploration for low-activity users and low-risk exploitation for high-activity users: optimizing the upper quantiles for low-activity users to uncover latent interests, and the lower quantiles for high-activity users to mitigate risks. We further introduce two alignment losses to enhance training stability and consistency. Experiments demonstrate ADQO's superior performance across three datasets, effectively converting low-activity users to higher states and retaining high-activity users, validating its practical applicability. Our data analysis and training code are shared at https://anonymous.4open.science/r/ADQO-6DC9/.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 4816

Loading