Toward joint utilization of absolute and relative bandit feedback for conversational recommendation

Yu Xia, Zhihui Xie, Tong Yu, Canzhe Zhao, Shuai Li

Published: 2024, Last Modified: 23 Jan 2026User Model. User Adapt. Interact. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Conversational recommendation has been a promising solution for recent recommenders to address the cold-start problem suffered by traditional recommender systems. To actively elicit users’ dynamically changing preferences, conversational recommender systems periodically query the users’ preferences on item attributes and collect conversational feedback. However, most existing conversational recommender systems only enable users to provide one type of feedback, either absolute or relative. In practice, absolute feedback can be biased and imprecise due to users’ varying rating criteria. Relative feedback, in the meanwhile, suffers from its hardship to reveal the absolute user attitudes. Hence, asking only one type of questions throughout the whole conversation may not efficiently elicit users’ preferences of high accuracy. Moreover, many existing conversational recommender systems only allow users to provide binary feedback, which can be noisy when users do not have a particular inclination. To address the above issues, we propose a generalized conversational recommendation framework, hybrid rating-comparison conversational recommender system. The system can seamlessly ask absolute and relative questions and incorporate both types of feedback with possible neutral responses. While it is promising to utilize different types of feedback together, it can be difficult to build a joint model incorporating them as they bear different interpretations of users’ preferences. To ensure relative feedback can be effectively leveraged, we first propose a bandit algorithm, RelativeConUCB. On the basis of it, we further propose a new bandit algorithm, ArcUCB, to utilize jointly absolute and relative feedback with possible neutral responses for preference elicitation. The experiments on both synthetic and real-world datasets validate the advantage of our proposed methods, in comparison with existing bandit algorithms in conversational recommender systems