Contextual and Nonstationary Multi-armed Bandits Using the Linear Gaussian State Space Model for the Meta-Recommender System

Published: 01 Jan 2023, Last Modified: 10 Jun 2024SMC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Selecting an optimal recommendation method is crucial for an electronic commerce (EC) site. However, the effectiveness of recommendation methods cannot be known in advance. Although continuous comparative evaluation in an actual environment is essential, it results in opportunity loss. To overcome this problem, opportunity loss reduction has been studied as a multi-armed bandit (MAB) problem, and adaptive meta-recommender systems (meta-RS) were devised to automatically and continuously select the best recommendation method according to a policy. The following three factors cause opportunity loss: the context of the recommendation method, temporal variation, and response time. However, studies are yet to formulate an MAB policy that considers all three factors. Thus, reducing opportunity loss remains a problem. We propose an MAB policy that considers all three causes of opportunity loss by using a Kalman filter for a linear Gaussian state space model. We conducted extensive experiments to select the best recommendation method using data from a real EC site. The results revealed that the proposed policy is highly effective for reducing opportunity loss of the meta-RS during evaluation and increasing cumulative clicks compared with baseline policies.
Loading