Abstract: Contextual multi-armed bandit (CMAB) problems have gained increasing attention and popularity recently due to their capability of using context information to deliver recommendation services. In this paper, we formalize e-commerce recommendations as CMAB problems and propose a novel CMAB approach based on implicit feedback data such as click and purchase records. We use product categories as arms to reduce arm size and leverage user behavior contexts to update the estimation of expected reward for each arm, and no negative samples are needed here to train model. As a core part of the approach, we design a contextual bandit recommendation algorithm based on Thompson sampling, named IF-TS, which can provide real-time response by learning user preferences online and alleviate the cold start problem by adding non-personalized actions. The experiments on three real-world datasets show that our approach can dynamically update user preferences using implicit context information and achieves a good recommendation effect. The experimental results also demonstrate that the proposed algorithm is robust in cold start environments.
Loading