Factual and Personalized Recommendation Language Modeling with Reinforcement Learning

Jihwan Jeong; Yinlam Chow; Guy Tennenholtz; ChihWei Hsu; Azamat Tulepbergenov; Mohammad Ghavamzadeh; Craig Boutilier

Factual and Personalized Recommendation Language Modeling with Reinforcement Learning

Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, ChihWei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Large language model, reinforcement learning, conversational recommender systems

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Develop a personalized and persuasive recommender language model via reinforcement learning

Abstract: Recommender systems (RSs) play a central role in connecting users to content, products and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P$^4$LM) that recommends items to users in a way that better explains item characteristics and their relevance to a user's preferences. To do this, P$^4$LM uses the embedding space representation of a user's preferences constructed by a traditional RS to generate compelling responses that are factually-grounded and relevant w.r.t. those preferences. Moreover, we develop a joint reward function that measures precision, appeal, and personalization, which we use as AI-based feedback for reinforcement learning-based language modeling. Using MovieLens data, we show that P$^4$LM can deliver compelling, personalized movie narratives to users.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3163

Loading