Latent Ranked BanditsDownload PDF

02 May 2019 (modified: 05 May 2023)Submitted to RL4RealLife 2019Readers: Everyone
Keywords: Multi-armed Bandits, Ranking, Latent Bandits, Non-stochastic Rewards
Abstract: We study the problem of learning personalized ranked lists of diverse items for multiple users, from sequential observations of user preferences. The user-item preference matrix is non-negative and low-rank. Existing methods for solving similar problems are based on reconstructing the preference matrix from its noisy observations using matrix factorization techniques, and typically require strong assumptions on the reconstructed matrix. We depart from this standard approach and consider a family of low-rank matrices, where the set of most preferred items of all users is small and can be learned efficiently. Moreover, in contrast to previous approaches, we assume that the preference matrix is non-stochastic, and so our approach is more general. Then we learn to present this set to each user in a personalized manner, in the order of the descending preferences of the user. We propose a computationally efficient algorithm that implements this procedure, which we call latent ranker (LRA). We evaluate the algorithm empirically on several synthetic and real-world datasets. In all experiments, we outperform existing state-of-the-art algorithms.
3 Replies

Loading