Graph-Enhanced Hybrid Sampling for Multi-Armed Bandit Recommendation

Fen Wang, Taihao Li, Wuyue Zhang, Xue Zhang, Cheng Yang

Published: 01 Jan 2024, Last Modified: 10 Nov 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph-based multi-armed bandit algorithms utilize the relationship between users to select the best item to recommend for maximal reward, which is decided by items’ features and un-known users’ preferences. Therefore, the precise estimation of users’ preferences is fairly important and indispensable for bandit sampling, though it is not the ultimate target. However, existing algorithms generally neglect this crucial point and utilize reward maximization as objective in the first beginning, using inaccurate estimation as input, which deteriorates the performance from a long-term perspective. In this paper, we will propose one hybrid sampling framework for bandit selection, which at first purely focuses on the performance of estimation and then on the performance of reward maximization. Specifically, we propose an ‘unsupervised’ bandit selection objective to minimize expected estimation error, which doesn’t take users’ preferences as input and suppresses an approximate upper-bound of cumulative regret. Then, we design a low-complexity selection algorithm to optimize this formulated problem with simple multiplications between items’ features and users’ graphical relations. Subsequently, for reward maximization, we cascade one graph-based algorithm to find the following bandits on the basis of our proposed warm-starts. Extensive experiments on different graphs indicate that our proposed hybrid framework is substantially better than existing popular methods in terms of recommendation performance.