Keywords: Boredom, Reinforcement Learning, Recommendation, Diversity, Recommender System, HAI, HRI
TL;DR: We present multiarm bandit algorithms that learn user boredom thresholds and use them for activity recommendations.
Abstract: We consider sequential recommender systems that work in multiple sessions, with a fixed catalog. Each session opens with a single recommendation.
Acceptance leads to another recommendation. The session ends upon first rejection.
The goal is to maximize session length.
Myopic exploitation of previously-successful recommendations quickly leads to user boredom.
We introduce novel bandit algorithms that improve recommendation variety by learning and enforcing per-user, per-item boredom thresholds.
This allows repeated recommendations, appropriately spaced in time, with a high acceptance rate. Learning takes place in two stages: (i) item-specific boredom thresholds are determined;
(ii) once the thresholds are known, preference for the item is learned via a standard bandit algorithm.
Evaluation using user data from a commercial system demonstrates clear improvements in session length.
Journal Edition Interest: Yes
Supplementary Material: pdf
Submission Number: 50
Loading