Learning User Boredom Constraints in Sequential Recommender Systems

Published: 03 Jun 2026, Last Modified: 03 Jun 2026ALA 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Boredom, Reinforcement Learning, Recommendation, Diversity, Recommender System, HAI, HRI
TL;DR: We present multiarm bandit algorithms that learn user boredom thresholds and use them for activity recommendations.
Abstract: We consider sequential recommender systems that work in multiple sessions, with a fixed catalog. Each session opens with a single recommendation. Acceptance leads to another recommendation. The session ends upon first rejection. The goal is to maximize session length. Myopic exploitation of previously-successful recommendations quickly leads to user boredom. We introduce novel bandit algorithms that improve recommendation variety by learning and enforcing per-user, per-item boredom thresholds. This allows repeated recommendations, appropriately spaced in time, with a high acceptance rate. Learning takes place in two stages: (i) item-specific boredom thresholds are determined; (ii) once the thresholds are known, preference for the item is learned via a standard bandit algorithm. Evaluation using user data from a commercial system demonstrates clear improvements in session length.
Journal Edition Interest: Yes
Supplementary Material: pdf
Submission Number: 50
Loading