Abstract: The low-rank structure is one of the most prominent features in modern recommendation problems. In this paper, we consider an online learning problem with a low-rank expected reward matrix where both row features and column features are unknown a priori, and the agent aims to learn to choose the best row-column pair (i.e. the maximum entry) in the matrix. We develop a novel online recommendation algorithm based on ensemble sampling, a recently developed computationally efficient approximation of Thompson sampling. Our computational results show that our algorithm consistently achieves order-of-magnitude improvements over the baselines in both synthetic and real-world experiments.
0 Replies
Loading