Gaussian Process Bandits for Top-k Recommendations

Mohit Yadav; Cameron N Musco; Daniel Sheldon

Gaussian Process Bandits for Top-k Recommendations

Mohit Yadav, Cameron N Musco, Daniel Sheldon

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Gaussian processes, Bandit algorithms, Top-k recommendations, Linear Algebra, Iterative Algorithms

TL;DR: This paper explores Gaussian process bandit algorithms using Kendall kernels for top-k ranking, introducing a new variant of the Kendall kernel, enabling fast inference, and including a regret analysis.

Abstract: Algorithms that utilize bandit feedback to optimize top-k recommendations are vital for online marketplaces, search engines, and content platforms. However, the combinatorial nature of this problem poses a significant challenge, as the possible number of ordered top-k recommendations from $n$ items grows exponentially with $k$. As a result, previous work often relies on restrictive assumptions about the reward or bandit feedback models, such as assuming that the feedback discloses rewards for each recommended item rather than a single scalar feedback for the entire set of top-k recommendations. We introduce a novel contextual bandit algorithm for top-k recommendations, leveraging a Gaussian process with a Kendall kernel to model the reward function. Our algorithm requires only scalar feedback from the top-k recommendations and does not impose restrictive assumptions on the reward structure. Theoretical analysis confirms that the proposed algorithm achieves sub-linear regret in relation to the number of rounds and arms. Additionally, empirical results using a bandit simulator demonstrate that the proposed algorithm outperforms other baselines across various scenarios.

Primary Area: Probabilistic methods (for example: variational inference, Gaussian processes)

Submission Number: 12351

Loading