Abstract: Accuracy has been the primary benchmark for assessing recommenders learned from sequential interactions. To improve user experience by diverse and novel recommendation, our paper focuses on Multi-objective Sequential Recommendation (MOSR) to balance these conflicting objectives. Although a few studies leveraged reinforcement learning (RL) to solve MOSR, these methods can lead to sub-optimal results. First, traditional offline RL approach typically optimizes various objectives independently via multiple RL heads, accumulating prediction errors and leading to unstable performance. Furthermore, the offline policy cannot dynamically adjust objective weights during the inference stage, limiting adaptability to varying contexts. To this end, we introduce Multi-objective Decision Transformer for Reward-driven Recommendation (MODT4R), a novel framework that addresses MOSR as sequence modeling problem. First, we propose a user trajectory to capture user state transitions along with their multi-objective interests, represented by sequential expected cumulative rewards (returns). Moreover, the supervised learning paradigm makes the training process more stable while naturally integrating multi-objective optimization into sequence modeling by using multiple returns as conditional inputs. During inference, a score function is used to adjust the weights of diversity and novelty. Experimental evaluations on real-world datasets demonstrate that MODT4R significantly enhances diversity and novelty while maintaining accuracy compared to existing state-of-the-art methods.
External IDs:dblp:journals/tkde/WangKAJG25
Loading